#llama-cpp
14 articles
Intermediate
The GGUF Model Format
#gguf
#llama-cpp
#model-format
#serialization
Advanced
llama.cpp Quantization Methods
#quantization
#llama-cpp
#gguf
#inference-optimization
Intermediate
Ollama + llama.cpp Architecture Overview
#ollama
#llama-cpp
#architecture
#inference
Intermediate
The Complete Journey of a Single Inference
#ollama
#llama-cpp
#inference
#pipeline
Advanced
Batch, Ubatch & the Decoding Main Loop
#llama-cpp
#batch
#ubatch
#decoding
#parallel-sequences
#kv-cache
Advanced
Compute Graph Construction & Architecture Dispatch
#llama-cpp
#compute-graph
#architecture
#ggml
#graph-reuse
Advanced
Execution, Sampling & Context Management
#llama-cpp
#execution
#sampling
#speculative-decoding
#kv-cache
#context-management
Advanced
Model Loading: From File to Device
#llama-cpp
#model-loading
#mmap
#gpu-offload
#backend
Advanced
llama.cpp Execution Pipeline Overview
#llama-cpp
#inference-engine
#architecture
#source-code
Advanced
Backend Scheduling, Op Fusion & Memory Allocation
#llama-cpp
#backend-scheduling
#op-fusion
#memory-allocation
#pipeline-parallelism
Advanced
Tool Landscape and GGUF Binary Parsing
#llama-cpp
#gguf
#quantization
#binary-format
Advanced
Warmup, Tokenization & Chat Template
#llama-cpp
#warmup
#tokenization
#chat-template
#jinja2
#multimodal
Intermediate
Impact of Optimization on Accuracy
#benchmark
#quantization
#accuracy
#perplexity
#openvino
#lm-eval-harness
#llama-cpp
Intermediate
Hands-On: HF β GGUF / ONNX / OpenVINO β Three End-to-End Paths
#quantization
#model-conversion
#hands-on
#llama-cpp
#onnx
#openvino
#intel-igpu