Intel iGPU Inference Deep Dive: Xe2 Architecture, oneDNN & OpenVINO
From Xe2 microarchitecture to oneDNN primitives, from SPIR-V compilation to OpenVINO graph optimization, from performance analysis to GPU+NPU co-inference β a systematic deep dive into AI inference on Intel iGPU.
- 1
Xe2 GPU Architecture
Advanced#intel#xe2#gpu-architecture#igpu#lunar-lake#panther-lake - 2
Xe2 Execution Model and Programming Abstractions
Advanced#intel#xe2#simd#sycl#execution-model#workgroup - 3
SPIR-V Compilation and Level Zero Runtime
Advanced#intel#spirv#level-zero#compiler#runtime#jit#aot - 4
oneDNN Primitive System
Advanced#intel#onednn#primitive#memory-format#operator-library - 5
oneDNN GPU Kernel Optimization
Advanced#intel#onednn#kernel-optimization#gemm#xmx#mixed-precision - 6
OpenVINO Graph Optimization Pipeline
Advanced#intel#openvino#graph-optimization#model-compilation#plugin - 7
Intel Model Optimization Stack: Choosing Between Optimum Intel, NNCF, and OpenVINO
Intermediate#intel#optimum#nncf#openvino#quantization#model-conversion - 8
Performance Analysis and Bottleneck Diagnosis
Advanced#intel#performance#profiling#roofline#vtune#bottleneck - 9
NPU Architecture and GPU+NPU Co-Inference
Advanced#intel#npu#openvino#hetero#multi-device#co-inference - 10
LLM Inference on NPU: KV Cache and the Software Stack
Advanced#intel#npu#llm#kv-cache#openvino#npuw#static-shape - 11
NPU Execution Model and the Boundaries of Its Programming Model
Advanced#intel#npu#execution-model#dma#tiling#attention#programming-model#cute - 12
Hands-On: HF β GGUF / ONNX / OpenVINO β Three End-to-End Paths
Intermediate#quantization#model-conversion#hands-on#llama-cpp#onnx#openvino#intel-igpu