Learning Paths
Transformer Core Mechanisms
IntermediateDeep dive into every component of the Transformer, from architecture to attention
16 articles
Transformer Across Modalities
IntermediateFrom text representation to multimodal generation — understand how Transformers adapt to text, image, audio, and video modalities. Recommended: complete the Transformer Core Mechanisms path first.
10 articles
LLM Quantization Techniques
IntermediateFrom data type fundamentals to cutting-edge quantization algorithms — weight quantization, KV cache quantization, and inference-time quantization
7 articles
vLLM + SGLang Inference Engine Deep Dive
AdvancedFrom PagedAttention to RadixAttention, from scheduling preemption to structured output — a systematic guide to modern LLM inference engine algorithms and design philosophy.
5 articles
LLM Model Routing: Intelligent Model Selection and Hybrid Inference
AdvancedAutomatically select the right LLM based on task complexity. Covers the full spectrum from simple classifiers to RL-based online learning, query-level to token-level routing, and single-model selection to multi-model collaboration.
8 articles
LLM Evaluation and Benchmarks Deep Dive
IntermediateSystematic understanding of LLM evaluation: from benchmark design principles to specific benchmark deep dives, from optimization accuracy assessment to model selection decisions. Covers knowledge, reasoning, code, and agent evaluation with focus on OpenVINO toolchain and small model assessment.
10 articles
Ollama + llama.cpp Deep Dive
AdvancedDeep dive into Ollama and llama.cpp internals — architecture, quantization, compute graphs, hardware backends, and serving infrastructure.
9 articles
llama.cpp Source Code Walkthrough
AdvancedTrace llama.cpp's complete C/C++ execution flow function by function. This path extends the 'Ollama + llama.cpp Deep Dive' from concepts to source-level implementation details.
8 articles
AI Compute Stack
IntermediateUnderstanding the AI software stack from inference frameworks to hardware ISA
5 articles
Graph Compilation & Optimization
AdvancedDeep dive into ML compiler internals: the complete journey from graph capture to optimized execution. Dual-track coverage of PyTorch 2.0 (torch.compile / TorchInductor / Triton) and MLIR (Dialect system / Progressive Lowering). Prerequisite: AI Compute Stack.
17 articles
Reinforcement Learning: From Foundations to LLM Alignment & Reasoning
AdvancedFrom MDP to Policy Gradient, from RLHF to GRPO, from Reward Modeling to Test-Time Scaling — a systematic guide to how reinforcement learning drives LLM alignment, optimization, and reasoning.
8 articles
Intel iGPU Inference Deep Dive: Xe2 Architecture, oneDNN & OpenVINO
AdvancedFrom Xe2 microarchitecture to oneDNN primitives, from SPIR-V compilation to OpenVINO graph optimization, from performance analysis to GPU+NPU co-inference — a systematic deep dive into AI inference on Intel iGPU.
12 articles
Matrix Mathematics: From Foundational Theory to Modern AI Architectures
AdvancedMatrices are the lingua franca of ML. This path builds four core tools (decomposition, measurement, calculus, iteration), covers classical methods (SVD, PCA, NMF) and operator analysis (PageRank, spectral clustering), then converges on modern architectures (LoRA, Efficient Attention, SSM/Mamba). The "decompose → propagate → converge" arc reveals how one mathematical tool manifests across seemingly different domains.
31 articles
Graph Algorithms: From Structural Exploration to Combinatorial Optimization
IntermediateGraphs are the universal modeling language for "entities + relationships." This path builds three layers of capability (structural exploration, measurement, combinatorial optimization), covers classical algorithms (BFS/DFS, shortest paths, network flow), then converges on modern methods (random graph models, probabilistic graphical models, GNNs). The "explore → measure → optimize → model" arc reveals how one set of graph tools manifests across seemingly different engineering domains.
22 articles