Home | LLM Learning

Learning Paths

Transformer Core Mechanisms

Deep dive into every component of the Transformer, from architecture to attention

16 articles

Transformer Across Modalities

From text representation to multimodal generation — understand how Transformers adapt to text, image, audio, and video modalities. Recommended: complete the Transformer Core Mechanisms path first.

10 articles

LLM Quantization Techniques

Intermediate

From data type fundamentals to cutting-edge quantization algorithms — weight quantization, KV cache quantization, and inference-time quantization

7 articles

vLLM + SGLang Inference Engine Deep Dive

Advanced

From PagedAttention to RadixAttention, from scheduling preemption to structured output — a systematic guide to modern LLM inference engine algorithms and design philosophy.

5 articles

LLM Model Routing: Intelligent Model Selection and Hybrid Inference

Advanced

Automatically select the right LLM based on task complexity. Covers the full spectrum from simple classifiers to RL-based online learning, query-level to token-level routing, and single-model selection to multi-model collaboration.

8 articles

LLM Evaluation and Benchmarks Deep Dive

Intermediate

Systematic understanding of LLM evaluation: from benchmark design principles to specific benchmark deep dives, from optimization accuracy assessment to model selection decisions. Covers knowledge, reasoning, code, and agent evaluation with focus on OpenVINO toolchain and small model assessment.

10 articles

Ollama + llama.cpp Deep Dive

Advanced

Deep dive into Ollama and llama.cpp internals — architecture, quantization, compute graphs, hardware backends, and serving infrastructure.

9 articles

llama.cpp Source Code Walkthrough

Advanced

Trace llama.cpp's complete C/C++ execution flow function by function. This path extends the 'Ollama + llama.cpp Deep Dive' from concepts to source-level implementation details.

8 articles

AI Compute Stack

Intermediate

Understanding the AI software stack from inference frameworks to hardware ISA

5 articles

Graph Compilation & Optimization

Advanced

Deep dive into ML compiler internals: the complete journey from graph capture to optimized execution. Dual-track coverage of PyTorch 2.0 (torch.compile / TorchInductor / Triton) and MLIR (Dialect system / Progressive Lowering). Prerequisite: AI Compute Stack.

17 articles

Reinforcement Learning: From Foundations to LLM Alignment & Reasoning

Advanced

From MDP to Policy Gradient, from RLHF to GRPO, from Reward Modeling to Test-Time Scaling — a systematic guide to how reinforcement learning drives LLM alignment, optimization, and reasoning.

8 articles

Intel iGPU Inference Deep Dive: Xe2 Architecture, oneDNN & OpenVINO

Advanced

From Xe2 microarchitecture to oneDNN primitives, from SPIR-V compilation to OpenVINO graph optimization, from performance analysis to GPU+NPU co-inference — a systematic deep dive into AI inference on Intel iGPU.

12 articles

Matrix Mathematics: From Foundational Theory to Modern AI Architectures

Advanced

Matrices are the lingua franca of ML. This path builds four core tools (decomposition, measurement, calculus, iteration), covers classical methods (SVD, PCA, NMF) and operator analysis (PageRank, spectral clustering), then converges on modern architectures (LoRA, Efficient Attention, SSM/Mamba). The "decompose → propagate → converge" arc reveals how one mathematical tool manifests across seemingly different domains.

31 articles

Graph Algorithms: From Structural Exploration to Combinatorial Optimization

Intermediate

Graphs are the universal modeling language for "entities + relationships." This path builds three layers of capability (structural exploration, measurement, combinatorial optimization), covers classical algorithms (BFS/DFS, shortest paths, network flow), then converges on modern methods (random graph models, probabilistic graphical models, GNNs). The "explore → measure → optimize → model" arc reveals how one set of graph tools manifests across seemingly different engineering domains.

Learning Paths

Transformer Core Mechanisms

Transformer Across Modalities

LLM Quantization Techniques

vLLM + SGLang Inference Engine Deep Dive

LLM Model Routing: Intelligent Model Selection and Hybrid Inference

LLM Evaluation and Benchmarks Deep Dive

Ollama + llama.cpp Deep Dive

llama.cpp Source Code Walkthrough

AI Compute Stack

Graph Compilation & Optimization

Reinforcement Learning: From Foundations to LLM Alignment & Reasoning

Intel iGPU Inference Deep Dive: Xe2 Architecture, oneDNN & OpenVINO

Matrix Mathematics: From Foundational Theory to Modern AI Architectures

Graph Algorithms: From Structural Exploration to Combinatorial Optimization

Browse by Tag

All Articles

概率图模型：图上的不确定性推理

核心性质速查：概念关系图与公式速查表

矩阵补全：从极少观测恢复低秩矩阵

矩阵范数、内积与条件数：度量的艺术

矩阵结构的几何：二次型、正定性与协方差

矩阵数学全景图：ML 的通用语言

矩阵微积分：从 Jacobian 到损失曲面

连通性：图能拆成几块？

连续时间线性系统与 Kalman 滤波：从离散步进到平滑流动

马尔可夫链与转移矩阵：当矩阵编码概率

欧拉与哈密顿：遍历的两种完备性

匹配：最优配对

奇异值分解：核心中的核心

社区发现：哪些节点抱团？

树上算法：图的特殊骨架

数据矩阵分解概述：问题、工具与方法谱系

算子矩阵全景：当矩阵不再装数据

随机化 SVD：当精确分解算不动的时候

随机图与网络模型：真实网络长什么样？

随机游走与图嵌入：DeepWalk/Node2Vec

拓扑排序与 DAG：有依赖时的合法顺序

特征分解与对角化：万物之基

图 Laplacian 与谱聚类：从图结构到最优分割

图建模案例集：这个问题其实是图问题

图扩散、热核与 GNN 消息传递：从热方程到图神经网络

图嵌入与图神经网络：把图变成向量

图上的通用迭代机器（上）：从数学问题到求解框架

图上的通用迭代机器（下）：范式、领域与边界

图算法全景图：从结构探索到组合优化

团与密子图：最紧密的子群

网络流：管道能通多少？

相似性与同构：两个图/节点有多像？

向量空间的几何：内积、投影、秩与子空间

学习算子中的低秩结构：为什么神经网络权重是低秩的？

隐马尔可夫模型：当状态看不见

优化算法：从梯度下降到牛顿法

张量分解与知识图谱嵌入：从二维到高阶

着色与划分：最少几种颜色？

中心性：谁最重要？

最短路径：图上的距离

最小生成树：最便宜地连通所有人

Actor-Critic and PPO: Stable Policy Optimization

Agent & Tool Use Benchmarks

AI Compute Stack Overview — From Inference Frameworks to Hardware ISA

Anatomy of Model Release Benchmark Standard Sets

Attention 的低秩结构与 Efficient Attention

Attention Computation in Detail

Attention Variants: From Sliding Window to MLA

Autotuning and End-to-End Practice

Backend Scheduling, Op Fusion & Memory Allocation

Batch, Ubatch & the Decoding Main Loop

Benchmark Landscape and Evaluation Methodology

BERT and GPT: Two Paths — Understanding vs Generation

BFCL Practical Guide

BFS 与 DFS：图的两种基本呼吸方式

Cascade and Self-Verification: Try the Cheap Model First, Upgrade If Needed

Code Benchmarks

Code Generation (Part I): Instruction Selection, Vectorization & Register Allocation

Code Generation (Part II): Triton Pipeline, Compiler Backends & Numerical Correctness

Compute Graph Construction & Architecture Dispatch

Compute Graphs and Inference Engines

CUDA Programming Model — From Code to Hardware

Diffusion Model Fundamentals: Generating from Noise