#attention
8 articles
Intermediate
Attention Computation in Detail
#transformer
#attention
#softmax
Advanced
Attention Variants: From Sliding Window to MLA
#transformer
#attention
#mla
#sliding-window
#cross-attention
Advanced
Flash Attention Tiling Principles
#attention
#hardware-optimization
#flash-attention
#memory
Advanced
MQA and GQA
#transformer
#attention
#mqa
#gqa
#kv-cache
Intermediate
Multi-Head Attention
#transformer
#attention
#multi-head
Intermediate
Positional Encoding β Giving Transformers a Sense of Order
#transformer
#attention
#positional-encoding
Intermediate
QKV Data Structures and Intuition
#transformer
#attention
#qkv
Advanced
NPU Execution Model and the Boundaries of Its Programming Model
#intel
#npu
#execution-model
#dma
#tiling
#attention
#programming-model
#cute