Content on this site is AI-generated and may contain errors. If you find issues, please report at GitHub Issues .

Learning Paths

Transformer Core Mechanisms

Intermediate

Deep dive into every component of the Transformer, from architecture to attention

16 articles

Transformer Across Modalities

Intermediate

From text representation to multimodal generation — understand how Transformers adapt to text, image, audio, and video modalities. Recommended: complete the Transformer Core Mechanisms path first.

10 articles

LLM Quantization Techniques

Intermediate

From data type fundamentals to cutting-edge quantization algorithms — weight quantization, KV cache quantization, and inference-time quantization

7 articles

vLLM + SGLang Inference Engine Deep Dive

Advanced

From PagedAttention to RadixAttention, from scheduling preemption to structured output — a systematic guide to modern LLM inference engine algorithms and design philosophy.

5 articles

LLM Model Routing: Intelligent Model Selection and Hybrid Inference

Advanced

Automatically select the right LLM based on task complexity. Covers the full spectrum from simple classifiers to RL-based online learning, query-level to token-level routing, and single-model selection to multi-model collaboration.

8 articles

LLM Evaluation and Benchmarks Deep Dive

Intermediate

Systematic understanding of LLM evaluation: from benchmark design principles to specific benchmark deep dives, from optimization accuracy assessment to model selection decisions. Covers knowledge, reasoning, code, and agent evaluation with focus on OpenVINO toolchain and small model assessment.

10 articles

Ollama + llama.cpp Deep Dive

Advanced

Deep dive into Ollama and llama.cpp internals — architecture, quantization, compute graphs, hardware backends, and serving infrastructure.

9 articles

llama.cpp Source Code Walkthrough

Advanced

Trace llama.cpp's complete C/C++ execution flow function by function. This path extends the 'Ollama + llama.cpp Deep Dive' from concepts to source-level implementation details.

8 articles

AI Compute Stack

Intermediate

Understanding the AI software stack from inference frameworks to hardware ISA

5 articles

Graph Compilation & Optimization

Advanced

Deep dive into ML compiler internals: the complete journey from graph capture to optimized execution. Dual-track coverage of PyTorch 2.0 (torch.compile / TorchInductor / Triton) and MLIR (Dialect system / Progressive Lowering). Prerequisite: AI Compute Stack.

17 articles

Reinforcement Learning: From Foundations to LLM Alignment & Reasoning

Advanced

From MDP to Policy Gradient, from RLHF to GRPO, from Reward Modeling to Test-Time Scaling — a systematic guide to how reinforcement learning drives LLM alignment, optimization, and reasoning.

8 articles

Intel iGPU Inference Deep Dive: Xe2 Architecture, oneDNN & OpenVINO

Advanced

From Xe2 microarchitecture to oneDNN primitives, from SPIR-V compilation to OpenVINO graph optimization, from performance analysis to GPU+NPU co-inference — a systematic deep dive into AI inference on Intel iGPU.

12 articles

Matrix Mathematics: From Foundational Theory to Modern AI Architectures

Advanced

Matrices are the lingua franca of ML. This path builds four core tools (decomposition, measurement, calculus, iteration), covers classical methods (SVD, PCA, NMF) and operator analysis (PageRank, spectral clustering), then converges on modern architectures (LoRA, Efficient Attention, SSM/Mamba). The "decompose → propagate → converge" arc reveals how one mathematical tool manifests across seemingly different domains.

31 articles

Graph Algorithms: From Structural Exploration to Combinatorial Optimization

Intermediate

Graphs are the universal modeling language for "entities + relationships." This path builds three layers of capability (structural exploration, measurement, combinatorial optimization), covers classical algorithms (BFS/DFS, shortest paths, network flow), then converges on modern methods (random graph models, probabilistic graphical models, GNNs). The "explore → measure → optimize → model" arc reveals how one set of graph tools manifests across seemingly different engineering domains.

22 articles

Browse by Tag

CP-decomposition ComplEx DistMult Tucker-decomposition a-star accuracy activation-checkpointing activation-quantization actor-critic advantage affine agent alignment aot aotautograd applications approximation architecture articulation-point assignment-problem attention audio automix autotuning awq backend backend-scheduling backends backpropagation bandit baseline batch batch-scheduling baum-welch bayesian-network belief-propagation bellman-equation bellman-ford benchmark bert betweenness bfcl bfs binary-format bioinformatics bipartite-matching bitnet blossom boruvka bottleneck bridge bucketing bufferization calculus cascade causal-inference centrality centroid chain-of-thought chat-template chatbot-arena cheatsheet christofides chromatic-number chunked-prefill classification classifier clip clipping clique closeness co-inference code code-evaluation codegen collaboration collaborative-filtering communication community-detection compiler compilers compute compute-graph computer-vision condition-number connectivity constitutional-ai constrained-decoding contamination context-management continuous-batching continuous-time contrastive-learning convergence convex-optimization convex-relaxation cost-model cost-optimization council-mode covariance covariance-matrix crf critical-path cross-attention cse cuda cuda-graph cuda-stream cute dag data-types dataflow-analysis dce ddpm decode decoding decomposition deepseek deepseek-r1 deepwalk degree deltanet dense-subgraph deployment determinant dfs diagonalization dialect dialect-conversion diameter diffusion dijkstra dimensionality-reduction dinic discretization distributed dit dma docker dominator-tree dpo dsatur dsl dynamic-programming dynamic-shapes eckart-young edmonds-karp efficient-attention eigendecomposition eigenfaces eigenvector embedding end-to-end ensemble erdos-renyi euler-path evaluation execution execution-model factor-graph factorization-machines fiedler-vector fine-tuning fixed-point flash-attention floyd-warshall ford-fulkerson forward-backward fp8 fpt frobenius frontier frugalgpt fsm function-calling fusion fx-graph gae gaia gat gaussian-process gc-tricolor gcn gemm gemma generation generative-model ggml gguf glove gnn gpqa gpt gptq gpu gpu-architecture gpu-offload gqa gradient-descent gram-matrix graph-algorithms graph-coloring graph-diffusion graph-embedding graph-kernel graph-laplacian graph-modeling graph-optimization graph-partitioning graph-reuse graphsage greedy greedy-coloring grpo gspmd guards hamiltonian-path hands-on hardware hardware-backend hardware-optimization harness heat-kernel heavy-light-decomposition hessian hetero hidden-markov-model hierholzer hippo hmm hopcroft-karp humaneval hungarian-algorithm hybrid hybrid-llm hymba igpu image-generation image-recognition implicit-factorization incoherence independent-set inductor inference inference-engine inference-optimization inner-product instruct-gpt instruction-selection intel intel-igpu intrinsic-dimension ipo ir isomorphism iteration-machine jacobian jamba jinja2 jit johnson-lindenstrauss jukebox k-core k-truss kalman-filter katz kernel kernel-fusion kernel-generation kernel-optimization kernel-pca kl-divergence knowledge-graph kruskal kv-cache label-propagation laplacian latency layout lca leaderboard learned-operator level-zero linear-algebra linear-systems linformer llama-cpp llm llm-as-judge llvm lm-eval lm-eval-harness local-cloud logistics loop-optimization lora loss-surface louvain low-rank low-rank-approximation lp-relaxation lunar-lake mamba markov markov-chains markov-random-field matching math matrix-completion matrix-exponential matrix-factorization matrix-math matroid max-flow mcts mdp memory memory-allocation memory-format memory-hierarchy memory-management memory-planning mercer-theorem message-passing metal methodology metis min-cut minimum-spanning-tree mixed-precision mixing-time mixtral mixture-of-agents mla mlir mmap mmlu model-compilation model-conversion model-format model-loading model-management model-release model-routing model-selection modelfile modularity moe mqa multi-backend multi-device multi-head multimodal music-generation musicgen network-flow network-models network-science newton-method nlp nlu nmf nncf node2vec non-negative-matrix-factorization norms np-complete np-hard npu npuw nuclear nuclear-norm null-space numerical-accuracy nvidia offline-rl ollama onednn onnx op-fusion openvino operator operator-fusion operator-library optimization optimum orthogonality outcome-reward overview paged-attention pagerank panther-lake paradigm-unification parallel-sequences parameter-efficient pareto parts-based pass pass-at-k pattern-matching pca performance performer perplexity perron-frobenius phi pipeline pipeline-parallel pipeline-parallelism planarity plugin pmi policy-gradient policy-optimization polyhedral pomdp positional-encoding positive-definite post-training power-iteration power-law ppo preemption preference-optimization prefill prefix-cache prefix-caching pretraining prim primitive principal-component-pursuit privacy probabilistic-graphical-models process-reward profiling programming programming-model progressive-lowering projection pseudoinverse ptq ptx pytorch q-learning qat qkv qlora quadratic-form quantization qwen radix-attention rag random-graphs random-projection random-walk randomized-svd rank reasoning recommender-systems register-allocation registry reinforce reinforcement-learning retrieval reward-hacking reward-model rkhs rlhf robust-pca roofline routellm routing runner runtime s4 sampling sbert scale-free scc scheduler scheduling selective-scan selective-ssm self-verification semantic-routing semiring sentence-embeddings sequence-modeling serialization sgd sglang sharding shared-memory shortest-path simd similarity simt sliding-window small-models small-world smoothquant social-networks softmax software-stack sora source-code sparse spatiotemporal-attention spectral spectral-clustering spectral-gap spectral-theorem speculative-decoding speech spirv ssa ssm stable-diffusion standard-set state-space state-space-model static-shape straight-through-estimator structured-output svd swe-bench sycl symbolic-shapes system-design systolic-array tarjan task-yaml taylor-expansion tensor-core tensor-decomposition tensor-parallel tensorrt-llm test-time-scaling thinking tiling tokenization tool-use toolchain topic-modeling topological-sort torch-compile torchdynamo trace training transform-dialect transformer transition-matrix traversal tree treewidth triton trust-region tsp tts ubatch unified-framework union-find vall-e value-function variance-reduction vectorization verifier vertex-cover video-generation vision-language vision-transformer vit viterbi vllm vlsi vtune vulkan warmup whisper wl-test word-embeddings word2vec workgroup xe2 xmx zamba zero-shot

All Articles

Advanced

概率图模型:图上的不确定性推理

Coming Soon
#graph-algorithms#probabilistic-graphical-models#bayesian-network#markov-random-field#belief-propagation#factor-graph#crf
Advanced

核心性质速查:概念关系图与公式速查表

Coming Soon
#matrix-math#linear-algebra#cheatsheet
Advanced

矩阵补全:从极少观测恢复低秩矩阵

Coming Soon
#matrix-math#matrix-completion#nuclear-norm#convex-relaxation#incoherence#low-rank
Advanced

矩阵范数、内积与条件数:度量的艺术

Coming Soon
#matrix-math#norms#condition-number#inner-product#frobenius#spectral#nuclear
Advanced

矩阵结构的几何:二次型、正定性与协方差

Coming Soon
#matrix-math#quadratic-form#positive-definite#covariance#gram-matrix#trace#determinant
Advanced

矩阵数学全景图:ML 的通用语言

Coming Soon
#matrix-math#linear-algebra#overview
Advanced

矩阵微积分:从 Jacobian 到损失曲面

Coming Soon
#matrix-math#calculus#jacobian#hessian#backpropagation#loss-surface#taylor-expansion
Intermediate

连通性:图能拆成几块?

Coming Soon
#graph-algorithms#connectivity#scc#tarjan#bridge#articulation-point
Advanced

连续时间线性系统与 Kalman 滤波:从离散步进到平滑流动

Coming Soon
#matrix-math#linear-systems#kalman-filter#matrix-exponential#state-space#continuous-time#discretization
Advanced

马尔可夫链与转移矩阵:当矩阵编码概率

Coming Soon
#matrix-math#markov-chains#transition-matrix#perron-frobenius#mixing-time
Intermediate

欧拉与哈密顿:遍历的两种完备性

Coming Soon
#graph-algorithms#euler-path#hamiltonian-path#np-complete#hierholzer
Intermediate

匹配:最优配对

Coming Soon
#graph-algorithms#matching#bipartite-matching#hungarian-algorithm#hopcroft-karp#blossom#assignment-problem
Advanced

奇异值分解:核心中的核心

Coming Soon
#matrix-math#svd#low-rank-approximation#pseudoinverse#eckart-young
Intermediate

社区发现:哪些节点抱团?

Coming Soon
#graph-algorithms#community-detection#modularity#louvain#label-propagation#k-core
Intermediate

树上算法:图的特殊骨架

Coming Soon
#graph-algorithms#tree#lca#diameter#centroid#heavy-light-decomposition#dominator-tree
Advanced

数据矩阵分解概述:问题、工具与方法谱系

Coming Soon
#matrix-math#decomposition#overview
Advanced

算子矩阵全景:当矩阵不再装数据

Coming Soon
#matrix-math#operator#markov#laplacian#kernel#overview
Advanced

随机化 SVD:当精确分解算不动的时候

Coming Soon
#matrix-math#randomized-svd#johnson-lindenstrauss#random-projection#low-rank-approximation
Advanced

随机图与网络模型:真实网络长什么样?

Coming Soon
#graph-algorithms#random-graphs#network-models#erdos-renyi#small-world#scale-free#power-law#network-science
Advanced

随机游走与图嵌入:DeepWalk/Node2Vec

Coming Soon
#matrix-math#random-walk#graph-embedding#deepwalk#node2vec#transition-matrix
Intermediate

拓扑排序与 DAG:有依赖时的合法顺序

Coming Soon
#graph-algorithms#topological-sort#dag#critical-path#dynamic-programming
Advanced

特征分解与对角化:万物之基

Coming Soon
#matrix-math#eigendecomposition#diagonalization#spectral-theorem
Advanced

图 Laplacian 与谱聚类:从图结构到最优分割

Coming Soon
#matrix-math#graph-laplacian#spectral-clustering#fiedler-vector#graph-partitioning
Advanced

图建模案例集:这个问题其实是图问题

Coming Soon
#graph-algorithms#graph-modeling#applications#compilers#recommender-systems#bioinformatics#causal-inference#nlp#vlsi#social-networks#logistics
Advanced

图扩散、热核与 GNN 消息传递:从热方程到图神经网络

Coming Soon
#matrix-math#graph-diffusion#heat-kernel#gnn#message-passing#graph-laplacian#gcn
Advanced

图嵌入与图神经网络:把图变成向量

Coming Soon
#graph-algorithms#graph-embedding#gnn#deepwalk#node2vec#gcn#gat#graphsage
Advanced

图上的通用迭代机器(上):从数学问题到求解框架

Coming Soon
#graph-algorithms#iteration-machine#unified-framework#frontier#fixed-point#bellman-equation
Advanced

图上的通用迭代机器(下):范式、领域与边界

Coming Soon
#graph-algorithms#iteration-machine#paradigm-unification#gc-tricolor#dataflow-analysis#belief-propagation#gnn#semiring#convergence
Intermediate

图算法全景图:从结构探索到组合优化

Coming Soon
#graph-algorithms#overview
Intermediate

团与密子图:最紧密的子群

Coming Soon
#graph-algorithms#clique#dense-subgraph#k-core#k-truss#independent-set#vertex-cover
Intermediate

网络流:管道能通多少?

Coming Soon
#graph-algorithms#network-flow#max-flow#min-cut#ford-fulkerson#edmonds-karp#dinic
Intermediate

相似性与同构:两个图/节点有多像?

Coming Soon
#graph-algorithms#similarity#isomorphism#graph-kernel#wl-test
Advanced

向量空间的几何:内积、投影、秩与子空间

Coming Soon
#matrix-math#inner-product#projection#rank#null-space#orthogonality
Advanced

学习算子中的低秩结构:为什么神经网络权重是低秩的?

Coming Soon
#matrix-math#learned-operator#low-rank#intrinsic-dimension#lora#overview
Advanced

隐马尔可夫模型:当状态看不见

Coming Soon
#matrix-math#hmm#hidden-markov-model#forward-backward#viterbi#baum-welch
Advanced

优化算法:从梯度下降到牛顿法

Coming Soon
#matrix-math#optimization#gradient-descent#newton-method#sgd#convergence
Advanced

张量分解与知识图谱嵌入:从二维到高阶

Coming Soon
#matrix-math#tensor-decomposition#knowledge-graph#CP-decomposition#Tucker-decomposition#DistMult#ComplEx
Intermediate

着色与划分:最少几种颜色?

Coming Soon
#graph-algorithms#graph-coloring#graph-partitioning#chromatic-number#greedy-coloring#dsatur#planarity#metis
Intermediate

中心性:谁最重要?

Coming Soon
#graph-algorithms#centrality#degree#betweenness#closeness#pagerank#eigenvector#katz
Intermediate

最短路径:图上的距离

Coming Soon
#graph-algorithms#shortest-path#dijkstra#bellman-ford#floyd-warshall#a-star
Intermediate

最小生成树:最便宜地连通所有人

Coming Soon
#graph-algorithms#minimum-spanning-tree#kruskal#prim#boruvka#greedy#matroid#union-find
Advanced

Actor-Critic and PPO: Stable Policy Optimization

#actor-critic#ppo#gae#advantage#clipping#trust-region
Intermediate

Agent & Tool Use Benchmarks

#benchmark#agent#function-calling#tool-use#bfcl#gaia
Intermediate

AI Compute Stack Overview — From Inference Frameworks to Hardware ISA

#gpu#compute#software-stack#runtime#inference
Intermediate

Anatomy of Model Release Benchmark Standard Sets

#benchmark#model-release#standard-set#small-models#gemma#phi#qwen
Advanced

Attention 的低秩结构与 Efficient Attention

Coming Soon
#matrix-math#attention#low-rank#linformer#performer#efficient-attention#kernel
Intermediate

Attention Computation in Detail

#transformer#attention#softmax
Advanced

Attention Variants: From Sliding Window to MLA

#transformer#attention#mla#sliding-window#cross-attention
Advanced

Autotuning and End-to-End Practice

#compiler#autotuning#triton#mlir#transform-dialect#end-to-end#torch-compile
Advanced

Backend Scheduling, Op Fusion & Memory Allocation

#llama-cpp#backend-scheduling#op-fusion#memory-allocation#pipeline-parallelism
Advanced

Batch, Ubatch & the Decoding Main Loop

#llama-cpp#batch#ubatch#decoding#parallel-sequences#kv-cache
Intermediate

Benchmark Landscape and Evaluation Methodology

#benchmark#evaluation#methodology#llm-as-judge#contamination
Intermediate

BERT and GPT: Two Paths — Understanding vs Generation

#bert#gpt#pretraining#nlp#nlu#classification#generation
Advanced

BFCL Practical Guide

#benchmark#bfcl#function-calling#tool-use#evaluation
Intermediate

BFS 与 DFS:图的两种基本呼吸方式

Coming Soon
#graph-algorithms#bfs#dfs#traversal
Advanced

Cascade and Self-Verification: Try the Cheap Model First, Upgrade If Needed

#model-routing#cascade#self-verification#pomdp#frugalgpt#automix
Intermediate

Code Benchmarks

#benchmark#code#humaneval#swe-bench#pass-at-k
Advanced

Code Generation (Part I): Instruction Selection, Vectorization & Register Allocation

#compiler#codegen#instruction-selection#vectorization#register-allocation#gpu
Advanced

Code Generation (Part II): Triton Pipeline, Compiler Backends & Numerical Correctness

#compiler#codegen#triton#llvm#ptx#numerical-accuracy#backends
Advanced

Compute Graph Construction & Architecture Dispatch

#llama-cpp#compute-graph#architecture#ggml#graph-reuse
Advanced

Compute Graphs and Inference Engines

#ggml#compute-graph#inference-engine#operator-fusion
Intermediate

CUDA Programming Model — From Code to Hardware

#gpu#cuda#programming#simt#simd#intel#sycl
Intermediate

Diffusion Model Fundamentals: Generating from Noise

#diffusion#ddpm#generative-model#image-generation
Advanced

Diffusion Transformer: Image Generation with Transformers

#dit#diffusion#transformer#image-generation#stable-diffusion
Advanced

Distributed Compilation and Graph Partitioning

#compiler#distributed#tensor-parallel#pipeline-parallel#gspmd#sharding#communication
Advanced

Dynamic Shapes: The Full-Pipeline Challenge from Capture to Execution

#compiler#dynamic-shapes#symbolic-shapes#guards#bucketing#pytorch
Advanced

Execution, Sampling & Context Management

#llama-cpp#execution#sampling#speculative-decoding#kv-cache#context-management
Advanced

Factorization Machines and LLM Routing: From FM Theory to MF Router

#model-routing#factorization-machines#matrix-factorization#routellm
Advanced

Flash Attention Tiling Principles

#attention#hardware-optimization#flash-attention#memory
Advanced

From DPO to GRPO: Direct Preference Optimization

#dpo#grpo#ipo#preference-optimization#offline-rl
Beginner

From Text to Vectors: Tokenization and Word Embeddings

#tokenization#embedding#word2vec#nlp
Advanced

GEMM Optimization — From Naive to Peak Performance

#gpu#gemm#cuda#optimization#tensor-core#xmx#intel
Intermediate

GPU Architecture — From Transistors to Threads

#gpu#architecture#hardware#nvidia
Advanced

Graph Capture: TorchDynamo, AOTAutograd & Functionalization

#compiler#pytorch#torchdynamo#aotautograd#fx-graph
Advanced

Graph Optimization Passes (Part 1): Data Flow Analysis & Pass Fundamentals

#compiler#optimization#pass#dataflow-analysis#dce#cse
Advanced

Graph Optimization Passes (Part 2): Advanced Optimizations & Pattern Matching

#compiler#optimization#layout#pattern-matching#memory-planning
Advanced

Graph Optimization Passes (Part 2): Polyhedral Optimization & Loop Transformations

#compiler#polyhedral#loop-optimization#affine#mlir#tiling
Intermediate

Hands-On: HF → GGUF / ONNX / OpenVINO — Three End-to-End Paths

#quantization#model-conversion#hands-on#llama-cpp#onnx#openvino#intel-igpu
Advanced

Hardware Backends

#ggml#cuda#metal#vulkan#hardware-backend
Advanced

Hybrid Architectures: Fusing Mamba with Attention

#hybrid#mamba#jamba#zamba#hymba#architecture
Advanced

Hybrid LLM: Intelligent Routing Between Local and Cloud

#model-routing#hybrid-llm#local-cloud#privacy#latency
Intermediate

Impact of Optimization on Accuracy

#benchmark#quantization#accuracy#perplexity#openvino#lm-eval-harness#llama-cpp
Advanced

Inference-Time Quantization: KV Cache and Activation Quantization

#quantization#kv-cache#activation-quantization#fp8#inference-optimization
Intermediate

Intel Model Optimization Stack: Choosing Between Optimum Intel, NNCF, and OpenVINO

#intel#optimum#nncf#openvino#quantization#model-conversion
Intermediate

Interpreting Leaderboards and Model Selection

#benchmark#leaderboard#model-selection#chatbot-arena#deployment
Advanced

IR Design (Part 1): SSA, FX IR & MLIR Dialects

#compiler#ir#ssa#pytorch#mlir#fx-graph#dialect
Advanced

IR Design (Part 2): Progressive Lowering and Multi-Level IR

#compiler#mlir#progressive-lowering#dialect-conversion#bufferization
Advanced

Kernel 矩阵与再生核:数据定义的给定算子

Coming Soon
#matrix-math#kernel#mercer-theorem#kernel-pca#gaussian-process#rkhs
Intermediate

Knowledge & Reasoning Benchmarks

#benchmark#reasoning#mmlu#gpqa#math
Advanced

KV Cache and Batch Scheduling

#kv-cache#batch-scheduling#continuous-batching#prefix-cache
Advanced

KV Cache Fundamentals

#inference#kv-cache#memory#optimization
Advanced

llama.cpp Execution Pipeline Overview

#llama-cpp#inference-engine#architecture#source-code
Advanced

llama.cpp Quantization Methods

#quantization#llama-cpp#gguf#inference-optimization
Intermediate

LLM Inference Engine Landscape: vLLM, SGLang, Ollama, and TensorRT-LLM

#inference#vllm#sglang#ollama#tensorrt-llm
Advanced

LLM Inference on NPU: KV Cache and the Software Stack

#intel#npu#llm#kv-cache#openvino#npuw#static-shape
Advanced

lm-eval-harness Practical Guide

#benchmark#lm-eval#evaluation#harness#task-yaml
Advanced

LoRA:低秩分解在 LLM 微调中的应用

Coming Soon
#matrix-math#lora#low-rank#fine-tuning#parameter-efficient#qlora
Intermediate

Matrix Acceleration Units — Tensor Core and XMX

#gpu#tensor-core#xmx#systolic-array#nvidia#intel
Advanced

MF 与 FM:协同过滤的矩阵分解视角

Coming Soon
#matrix-math#matrix-factorization#factorization-machines#recommender-systems#collaborative-filtering
Advanced

Mixture of Experts: Sparsely Activated Large Model Architecture

#transformer#moe#routing#deepseek#mixtral
Intermediate

Model Ecosystem

#ollama#registry#modelfile#lora#multimodal
Advanced

Model Loading: From File to Device

#llama-cpp#model-loading#mmap#gpu-offload#backend
Advanced

Model Routing Landscape: Why One Model Isn't Enough

#model-routing#llm#cost-optimization#system-design
Advanced

MQA and GQA

#transformer#attention#mqa#gqa#kv-cache
Intermediate

Multi-Head Attention

#transformer#attention#multi-head
Advanced

Multi-Model Collaboration: From Picking One to Using Many

#model-routing#mixture-of-agents#ensemble#council-mode#collaboration
Intermediate

Multimodal Alignment: CLIP and Cross-Modal Embedding Spaces

#clip#multimodal#contrastive-learning#zero-shot#vision-language
Advanced

Music Generation: When Transformers Learn to Compose

#music-generation#musicgen#jukebox#transformer#audio
Advanced

NMF:非负约束下的 Parts-Based 分解

Coming Soon
#matrix-math#nmf#non-negative-matrix-factorization#parts-based#topic-modeling
Advanced

NP-hard 与近似算法:当最优解算不出来

Coming Soon
#graph-algorithms#np-hard#approximation#tsp#christofides#lp-relaxation#fpt#treewidth#vertex-cover
Advanced

NPU Architecture and GPU+NPU Co-Inference

#intel#npu#openvino#hetero#multi-device#co-inference
Advanced

NPU Execution Model and the Boundaries of Its Programming Model

#intel#npu#execution-model#dma#tiling#attention#programming-model#cute
Intermediate

Ollama + llama.cpp Architecture Overview

#ollama#llama-cpp#architecture#inference
Advanced

oneDNN GPU Kernel Optimization

#intel#onednn#kernel-optimization#gemm#xmx#mixed-precision
Advanced

oneDNN Primitive System

#intel#onednn#primitive#memory-format#operator-library
Advanced

Online Learning and Cost Optimization: Routers Need to Evolve Too

#model-routing#bandit#reinforcement-learning#pareto#cost-optimization
Advanced

OpenVINO Graph Optimization Pipeline

#intel#openvino#graph-optimization#model-compilation#plugin
Advanced

Operator Fusion (Part I): Taxonomy & Decision Algorithms

#compiler#fusion#operator-fusion#kernel-fusion#optimization
Advanced

Operator Fusion (Part II): Cost Models & Fusion in Practice

#compiler#fusion#cost-model#flash-attention#inductor#optimization
Advanced

PagedAttention and Continuous Batching

#paged-attention#continuous-batching#vllm#memory-management#kv-cache
Advanced

PageRank 与幂迭代:图上的马尔可夫链

Coming Soon
#matrix-math#pagerank#power-iteration#markov-chains#spectral-gap
Intermediate

Panorama: The World of ML Compilers

#compiler#pytorch#mlir#triton#optimization
Advanced

PCA 与 Eigenfaces:从方差最大化到人脸识别

Coming Soon
#matrix-math#pca#eigenfaces#dimensionality-reduction#covariance-matrix#svd
Advanced

Performance Analysis and Bottleneck Diagnosis

#intel#performance#profiling#roofline#vtune#bottleneck
Intermediate

Policy Gradient: Directly Optimizing the Policy

#policy-gradient#reinforce#baseline#variance-reduction#advantage
Intermediate

Positional Encoding — Giving Transformers a Sense of Order

#transformer#attention#positional-encoding
Intermediate

Prefill vs Decode Phases

#inference#prefill#decode#performance
Advanced

Prefix Caching and RadixAttention

#prefix-caching#radix-attention#sglang#vllm#kv-cache
Advanced

PTQ Weight Quantization: From GPTQ to AWQ

#quantization#ptq#gptq#awq#smoothquant
Intermediate

QKV Data Structures and Intuition

#transformer#attention#qkv
Intermediate

Quantization and Model Conversion Toolchain Landscape

#quantization#model-conversion#toolchain#optimum#nncf#openvino#gguf#onnx
Advanced

Quantization Compilation and Mixed-Precision Optimization

#compiler#quantization#mixed-precision#kernel-generation#fusion
Intermediate

Quantization Fundamentals

#quantization#data-types#mixed-precision#inference-optimization
Advanced

Quantization-Aware Training (QAT)

#quantization#qat#straight-through-estimator#bitnet#lora
Advanced

Qwen3-Coder-Next Architecture: When SSM, Attention, and MoE Converge

#hybrid#moe#ssm#deltanet#qwen#architecture
Intermediate

Reinforcement Learning Foundations: From Agent to Bellman Equation

#reinforcement-learning#mdp#bellman-equation#value-function#q-learning
Advanced

Reward Design and Scaling

#reward-model#reward-hacking#process-reward#outcome-reward#constitutional-ai
Advanced

RLHF: Learning from Human Feedback

#rlhf#reward-model#alignment#instruct-gpt#kl-divergence
Advanced

Robust PCA:低秩 + 稀疏分解

Coming Soon
#matrix-math#robust-pca#low-rank#sparse#nuclear-norm#convex-optimization#principal-component-pursuit
Advanced

RouteLLM in Practice: From Preference Data to Production Routing

#model-routing#routellm#matrix-factorization#training#deployment
Advanced

Routing Classifiers: Letting Small Models Decide Who Answers

#model-routing#classifier#matrix-factorization#bert#semantic-routing
Intermediate

Sampling & Decoding — From Probabilities to Text

#inference#sampling#decoding#perplexity
Advanced

Scheduling and Execution Optimization

#compiler#scheduling#cuda-stream#cuda-graph#memory-planning#activation-checkpointing#multi-backend
Advanced

Scheduling and Preemption: The Inference Engine Scheduler

#scheduling#preemption#chunked-prefill#vllm#inference
Intermediate

Sentence Embeddings: From Token-Level to Semantic Retrieval

#sentence-embeddings#contrastive-learning#rag#retrieval#sbert
Advanced

Server Layer and Scheduling

#ollama#scheduler#runner#model-management
Advanced

SGLang Programming Model and Structured Output

#sglang#structured-output#constrained-decoding#fsm#dsl
Advanced

Speculative Decoding — Accelerating LLM Inference via Guessing

#inference#optimization#speculative-decoding
Advanced

Speech and Transformers: From Whisper to VALL-E

#audio#speech#whisper#vall-e#tts#transformer
Advanced

SPIR-V Compilation and Level Zero Runtime

#intel#spirv#level-zero#compiler#runtime#jit#aot
Advanced

SSM / Mamba:矩阵对角化的胜利

Coming Soon
#matrix-math#ssm#mamba#hippo#diagonalization#state-space#s4#selective-ssm
Advanced

State Space Models and Mamba

#ssm#mamba#state-space-model#selective-scan#sequence-modeling
Advanced

SWE-bench Practical Guide

#benchmark#swe-bench#code-evaluation#agent#docker
Advanced

Test-Time Scaling and Reasoning Enhancement

#test-time-scaling#chain-of-thought#mcts#deepseek-r1#thinking#verifier
Intermediate

The Complete Journey of a Single Inference

#ollama#llama-cpp#inference#pipeline
Intermediate

The GGUF Model Format

#gguf#llama-cpp#model-format#serialization
Advanced

Tiling Strategies & Memory Hierarchy Optimization

#compiler#tiling#memory-hierarchy#gpu#shared-memory#optimization
Advanced

Tool Landscape and GGUF Binary Parsing

#llama-cpp#gguf#quantization#binary-format
Intermediate

Transformer Architecture Overview

#transformer#architecture
Advanced

Video Generation: Spatiotemporal Attention and the Sora Architecture

#video-generation#sora#spatiotemporal-attention#dit#diffusion
Intermediate

Vision Transformer: When Images Become Token Sequences

#vision-transformer#vit#image-recognition#computer-vision
Advanced

Warmup, Tokenization & Chat Template

#llama-cpp#warmup#tokenization#chat-template#jinja2#multimodal
Intermediate

When RL Meets LLM: From Language Generation to Policy Optimization

#reinforcement-learning#llm#post-training#rlhf#policy-optimization#alignment
Advanced

Word2Vec 与 GloVe:隐式 vs 显式矩阵分解

Coming Soon
#matrix-math#word2vec#glove#pmi#word-embeddings#implicit-factorization
Advanced

Xe2 Execution Model and Programming Abstractions

#intel#xe2#simd#sycl#execution-model#workgroup
Advanced

Xe2 GPU Architecture

#intel#xe2#gpu-architecture#igpu#lunar-lake#panther-lake