#quantization
11 articles
Advanced
Inference-Time Quantization: KV Cache and Activation Quantization
#quantization
#kv-cache
#activation-quantization
#fp8
#inference-optimization
Advanced
llama.cpp Quantization Methods
#quantization
#llama-cpp
#gguf
#inference-optimization
Advanced
PTQ Weight Quantization: From GPTQ to AWQ
#quantization
#ptq
#gptq
#awq
#smoothquant
Intermediate
Quantization Fundamentals
#quantization
#data-types
#mixed-precision
#inference-optimization
Advanced
Quantization-Aware Training (QAT)
#quantization
#qat
#straight-through-estimator
#bitnet
#lora
Advanced
Quantization Compilation and Mixed-Precision Optimization
#compiler
#quantization
#mixed-precision
#kernel-generation
#fusion
Advanced
Tool Landscape and GGUF Binary Parsing
#llama-cpp
#gguf
#quantization
#binary-format
Intermediate
Impact of Optimization on Accuracy
#benchmark
#quantization
#accuracy
#perplexity
#openvino
#lm-eval-harness
#llama-cpp
Intermediate
Intel Model Optimization Stack: Choosing Between Optimum Intel, NNCF, and OpenVINO
#intel
#optimum
#nncf
#openvino
#quantization
#model-conversion
Intermediate
Quantization and Model Conversion Toolchain Landscape
#quantization
#model-conversion
#toolchain
#optimum
#nncf
#openvino
#gguf
#onnx
Intermediate
Hands-On: HF β GGUF / ONNX / OpenVINO β Three End-to-End Paths
#quantization
#model-conversion
#hands-on
#llama-cpp
#onnx
#openvino
#intel-igpu