Content on this site is AI-generated and may contain errors. If you find issues, please report at GitHub Issues .

LLM Quantization Techniques

From data type fundamentals to cutting-edge quantization algorithms β€” weight quantization, KV cache quantization, and inference-time quantization

  1. 1

    Quantization Fundamentals

    Intermediate
    #quantization#data-types#mixed-precision#inference-optimization
  2. 2

    PTQ Weight Quantization: From GPTQ to AWQ

    Advanced
    #quantization#ptq#gptq#awq#smoothquant
  3. 3

    Quantization-Aware Training (QAT)

    Advanced
    #quantization#qat#straight-through-estimator#bitnet#lora
  4. 4

    Inference-Time Quantization: KV Cache and Activation Quantization

    Advanced
    #quantization#kv-cache#activation-quantization#fp8#inference-optimization
  5. 5

    llama.cpp Quantization Methods

    Advanced
    #quantization#llama-cpp#gguf#inference-optimization
  6. 6

    Quantization and Model Conversion Toolchain Landscape

    Intermediate
    #quantization#model-conversion#toolchain#optimum#nncf#openvino#gguf#onnx
  7. 7

    Hands-On: HF β†’ GGUF / ONNX / OpenVINO β€” Three End-to-End Paths

    Intermediate
    #quantization#model-conversion#hands-on#llama-cpp#onnx#openvino#intel-igpu