Content on this site is AI-generated and may contain errors. If you find issues, please report at GitHub Issues .

Intel iGPU Inference Deep Dive: Xe2 Architecture, oneDNN & OpenVINO

From Xe2 microarchitecture to oneDNN primitives, from SPIR-V compilation to OpenVINO graph optimization, from performance analysis to GPU+NPU co-inference — a systematic deep dive into AI inference on Intel iGPU.

1

Xe2 GPU Architecture
Advanced

#intel#xe2#gpu-architecture#igpu#lunar-lake#panther-lake
2

Xe2 Execution Model and Programming Abstractions
Advanced

#intel#xe2#simd#sycl#execution-model#workgroup
3

SPIR-V Compilation and Level Zero Runtime
Advanced

#intel#spirv#level-zero#compiler#runtime#jit#aot
4

oneDNN Primitive System
Advanced

#intel#onednn#primitive#memory-format#operator-library
5

oneDNN GPU Kernel Optimization
Advanced

#intel#onednn#kernel-optimization#gemm#xmx#mixed-precision
6

OpenVINO Graph Optimization Pipeline
Advanced

#intel#openvino#graph-optimization#model-compilation#plugin
7

Intel Model Optimization Stack: Choosing Between Optimum Intel, NNCF, and OpenVINO
Intermediate

#intel#optimum#nncf#openvino#quantization#model-conversion
8

Performance Analysis and Bottleneck Diagnosis
Advanced

#intel#performance#profiling#roofline#vtune#bottleneck
9

NPU Architecture and GPU+NPU Co-Inference
Advanced

#intel#npu#openvino#hetero#multi-device#co-inference
10

LLM Inference on NPU: KV Cache and the Software Stack
Advanced

#intel#npu#llm#kv-cache#openvino#npuw#static-shape
11

NPU Execution Model and the Boundaries of Its Programming Model
Advanced

#intel#npu#execution-model#dma#tiling#attention#programming-model#cute
12

Hands-On: HF → GGUF / ONNX / OpenVINO — Three End-to-End Paths
Intermediate

#quantization#model-conversion#hands-on#llama-cpp#onnx#openvino#intel-igpu