#intel
14 articles
Intermediate
CUDA Programming Model β From Code to Hardware
#gpu
#cuda
#programming
#simt
#simd
#intel
#sycl
Advanced
GEMM Optimization β From Naive to Peak Performance
#gpu
#gemm
#cuda
#optimization
#tensor-core
#xmx
#intel
Advanced
Performance Analysis and Bottleneck Diagnosis
#intel
#performance
#profiling
#roofline
#vtune
#bottleneck
Intermediate
Matrix Acceleration Units β Tensor Core and XMX
#gpu
#tensor-core
#xmx
#systolic-array
#nvidia
#intel
Advanced
NPU Architecture and GPU+NPU Co-Inference
#intel
#npu
#openvino
#hetero
#multi-device
#co-inference
Advanced
oneDNN GPU Kernel Optimization
#intel
#onednn
#kernel-optimization
#gemm
#xmx
#mixed-precision
Advanced
oneDNN Primitive System
#intel
#onednn
#primitive
#memory-format
#operator-library
Advanced
OpenVINO Graph Optimization Pipeline
#intel
#openvino
#graph-optimization
#model-compilation
#plugin
Advanced
SPIR-V Compilation and Level Zero Runtime
#intel
#spirv
#level-zero
#compiler
#runtime
#jit
#aot
Advanced
Xe2 Execution Model and Programming Abstractions
#intel
#xe2
#simd
#sycl
#execution-model
#workgroup
Advanced
Xe2 GPU Architecture
#intel
#xe2
#gpu-architecture
#igpu
#lunar-lake
#panther-lake
Advanced
NPU Execution Model and the Boundaries of Its Programming Model
#intel
#npu
#execution-model
#dma
#tiling
#attention
#programming-model
#cute
Advanced
LLM Inference on NPU: KV Cache and the Software Stack
#intel
#npu
#llm
#kv-cache
#openvino
#npuw
#static-shape
Intermediate
Intel Model Optimization Stack: Choosing Between Optimum Intel, NNCF, and OpenVINO
#intel
#optimum
#nncf
#openvino
#quantization
#model-conversion