Graph Compilation & Optimization
Deep dive into ML compiler internals: the complete journey from graph capture to optimized execution. Dual-track coverage of PyTorch 2.0 (torch.compile / TorchInductor / Triton) and MLIR (Dialect system / Progressive Lowering). Prerequisite: AI Compute Stack.
- 1
Panorama: The World of ML Compilers
Intermediate#compiler#pytorch#mlir#triton#optimization - 2
Graph Capture: TorchDynamo, AOTAutograd & Functionalization
Advanced#compiler#pytorch#torchdynamo#aotautograd#fx-graph - 3
IR Design (Part 1): SSA, FX IR & MLIR Dialects
Advanced#compiler#ir#ssa#pytorch#mlir#fx-graph#dialect - 4
IR Design (Part 2): Progressive Lowering and Multi-Level IR
Advanced#compiler#mlir#progressive-lowering#dialect-conversion#bufferization - 5
Graph Optimization Passes (Part 1): Data Flow Analysis & Pass Fundamentals
Advanced#compiler#optimization#pass#dataflow-analysis#dce#cse - 6
Graph Optimization Passes (Part 2): Advanced Optimizations & Pattern Matching
Advanced#compiler#optimization#layout#pattern-matching#memory-planning - 7
Graph Optimization Passes (Part 2): Polyhedral Optimization & Loop Transformations
Advanced#compiler#polyhedral#loop-optimization#affine#mlir#tiling - 8
Operator Fusion (Part I): Taxonomy & Decision Algorithms
Advanced#compiler#fusion#operator-fusion#kernel-fusion#optimization - 9
Operator Fusion (Part II): Cost Models & Fusion in Practice
Advanced#compiler#fusion#cost-model#flash-attention#inductor#optimization - 10
Tiling Strategies & Memory Hierarchy Optimization
Advanced#compiler#tiling#memory-hierarchy#gpu#shared-memory#optimization - 11
Dynamic Shapes: The Full-Pipeline Challenge from Capture to Execution
Advanced#compiler#dynamic-shapes#symbolic-shapes#guards#bucketing#pytorch - 12
Code Generation (Part I): Instruction Selection, Vectorization & Register Allocation
Advanced#compiler#codegen#instruction-selection#vectorization#register-allocation#gpu - 13
Code Generation (Part II): Triton Pipeline, Compiler Backends & Numerical Correctness
Advanced#compiler#codegen#triton#llvm#ptx#numerical-accuracy#backends - 14
Quantization Compilation and Mixed-Precision Optimization
Advanced#compiler#quantization#mixed-precision#kernel-generation#fusion - 15
Distributed Compilation and Graph Partitioning
Advanced#compiler#distributed#tensor-parallel#pipeline-parallel#gspmd#sharding#communication - 16
Scheduling and Execution Optimization
Advanced#compiler#scheduling#cuda-stream#cuda-graph#memory-planning#activation-checkpointing#multi-backend - 17
Autotuning and End-to-End Practice
Advanced#compiler#autotuning#triton#mlir#transform-dialect#end-to-end#torch-compile