Content on this site is AI-generated and may contain errors. If you find issues, please report at GitHub Issues .

Graph Compilation & Optimization

Deep dive into ML compiler internals: the complete journey from graph capture to optimized execution. Dual-track coverage of PyTorch 2.0 (torch.compile / TorchInductor / Triton) and MLIR (Dialect system / Progressive Lowering). Prerequisite: AI Compute Stack.

  1. 1

    Panorama: The World of ML Compilers

    Intermediate
    #compiler#pytorch#mlir#triton#optimization
  2. 2

    Graph Capture: TorchDynamo, AOTAutograd & Functionalization

    Advanced
    #compiler#pytorch#torchdynamo#aotautograd#fx-graph
  3. 3

    IR Design (Part 1): SSA, FX IR & MLIR Dialects

    Advanced
    #compiler#ir#ssa#pytorch#mlir#fx-graph#dialect
  4. 4

    IR Design (Part 2): Progressive Lowering and Multi-Level IR

    Advanced
    #compiler#mlir#progressive-lowering#dialect-conversion#bufferization
  5. 5

    Graph Optimization Passes (Part 1): Data Flow Analysis & Pass Fundamentals

    Advanced
    #compiler#optimization#pass#dataflow-analysis#dce#cse
  6. 6

    Graph Optimization Passes (Part 2): Advanced Optimizations & Pattern Matching

    Advanced
    #compiler#optimization#layout#pattern-matching#memory-planning
  7. 7

    Graph Optimization Passes (Part 2): Polyhedral Optimization & Loop Transformations

    Advanced
    #compiler#polyhedral#loop-optimization#affine#mlir#tiling
  8. 8

    Operator Fusion (Part I): Taxonomy & Decision Algorithms

    Advanced
    #compiler#fusion#operator-fusion#kernel-fusion#optimization
  9. 9

    Operator Fusion (Part II): Cost Models & Fusion in Practice

    Advanced
    #compiler#fusion#cost-model#flash-attention#inductor#optimization
  10. 10

    Tiling Strategies & Memory Hierarchy Optimization

    Advanced
    #compiler#tiling#memory-hierarchy#gpu#shared-memory#optimization
  11. 11

    Dynamic Shapes: The Full-Pipeline Challenge from Capture to Execution

    Advanced
    #compiler#dynamic-shapes#symbolic-shapes#guards#bucketing#pytorch
  12. 12

    Code Generation (Part I): Instruction Selection, Vectorization & Register Allocation

    Advanced
    #compiler#codegen#instruction-selection#vectorization#register-allocation#gpu
  13. 13

    Code Generation (Part II): Triton Pipeline, Compiler Backends & Numerical Correctness

    Advanced
    #compiler#codegen#triton#llvm#ptx#numerical-accuracy#backends
  14. 14

    Quantization Compilation and Mixed-Precision Optimization

    Advanced
    #compiler#quantization#mixed-precision#kernel-generation#fusion
  15. 15

    Distributed Compilation and Graph Partitioning

    Advanced
    #compiler#distributed#tensor-parallel#pipeline-parallel#gspmd#sharding#communication
  16. 16

    Scheduling and Execution Optimization

    Advanced
    #compiler#scheduling#cuda-stream#cuda-graph#memory-planning#activation-checkpointing#multi-backend
  17. 17

    Autotuning and End-to-End Practice

    Advanced
    #compiler#autotuning#triton#mlir#transform-dialect#end-to-end#torch-compile