LLM Quantization Techniques
From data type fundamentals to cutting-edge quantization algorithms β weight quantization, KV cache quantization, and inference-time quantization
- 1
Quantization Fundamentals
Intermediate#quantization#data-types#mixed-precision#inference-optimization - 2
PTQ Weight Quantization: From GPTQ to AWQ
Advanced#quantization#ptq#gptq#awq#smoothquant - 3
Quantization-Aware Training (QAT)
Advanced#quantization#qat#straight-through-estimator#bitnet#lora - 4
Inference-Time Quantization: KV Cache and Activation Quantization
Advanced#quantization#kv-cache#activation-quantization#fp8#inference-optimization - 5
llama.cpp Quantization Methods
Advanced#quantization#llama-cpp#gguf#inference-optimization - 6
Quantization and Model Conversion Toolchain Landscape
Intermediate#quantization#model-conversion#toolchain#optimum#nncf#openvino#gguf#onnx - 7
Hands-On: HF β GGUF / ONNX / OpenVINO β Three End-to-End Paths
Intermediate#quantization#model-conversion#hands-on#llama-cpp#onnx#openvino#intel-igpu