Content on this site is AI-generated and may contain errors. If you find issues, please report at GitHub Issues .

#llama-cpp

14 articles

Intermediate

The GGUF Model Format

#gguf #llama-cpp #model-format #serialization
Advanced

llama.cpp Quantization Methods

#quantization #llama-cpp #gguf #inference-optimization
Intermediate

Ollama + llama.cpp Architecture Overview

#ollama #llama-cpp #architecture #inference
Intermediate

The Complete Journey of a Single Inference

#ollama #llama-cpp #inference #pipeline
Advanced

Batch, Ubatch & the Decoding Main Loop

#llama-cpp #batch #ubatch #decoding #parallel-sequences #kv-cache
Advanced

Compute Graph Construction & Architecture Dispatch

#llama-cpp #compute-graph #architecture #ggml #graph-reuse
Advanced

Execution, Sampling & Context Management

#llama-cpp #execution #sampling #speculative-decoding #kv-cache #context-management
Advanced

Model Loading: From File to Device

#llama-cpp #model-loading #mmap #gpu-offload #backend
Advanced

llama.cpp Execution Pipeline Overview

#llama-cpp #inference-engine #architecture #source-code
Advanced

Backend Scheduling, Op Fusion & Memory Allocation

#llama-cpp #backend-scheduling #op-fusion #memory-allocation #pipeline-parallelism
Advanced

Tool Landscape and GGUF Binary Parsing

#llama-cpp #gguf #quantization #binary-format
Advanced

Warmup, Tokenization & Chat Template

#llama-cpp #warmup #tokenization #chat-template #jinja2 #multimodal
Intermediate

Impact of Optimization on Accuracy

#benchmark #quantization #accuracy #perplexity #openvino #lm-eval-harness #llama-cpp
Intermediate

Hands-On: HF β†’ GGUF / ONNX / OpenVINO β€” Three End-to-End Paths

#quantization #model-conversion #hands-on #llama-cpp #onnx #openvino #intel-igpu