#transformer
11 articles
Intermediate
Transformer Architecture Overview
#transformer
#architecture
Intermediate
Attention Computation in Detail
#transformer
#attention
#softmax
Advanced
Attention Variants: From Sliding Window to MLA
#transformer
#attention
#mla
#sliding-window
#cross-attention
Advanced
MQA and GQA
#transformer
#attention
#mqa
#gqa
#kv-cache
Advanced
Mixture of Experts: Sparsely Activated Large Model Architecture
#transformer
#moe
#routing
#deepseek
#mixtral
Intermediate
Multi-Head Attention
#transformer
#attention
#multi-head
Intermediate
Positional Encoding β Giving Transformers a Sense of Order
#transformer
#attention
#positional-encoding
Intermediate
QKV Data Structures and Intuition
#transformer
#attention
#qkv
Advanced
Music Generation: When Transformers Learn to Compose
#music-generation
#musicgen
#jukebox
#transformer
#audio
Advanced
Speech and Transformers: From Whisper to VALL-E
#audio
#speech
#whisper
#vall-e
#tts
#transformer
Advanced
Diffusion Transformer: Image Generation with Transformers
#dit
#diffusion
#transformer
#image-generation
#stable-diffusion