Posts tagged with "online softmax"

1 post found

Mar 31, 2026 Flash Attention GPU optimization SRAM HBM tiling online softmax

Flash Attention: How One Paper Made Long Context Possible

Flash Attention never materializes the full n×n attention matrix. Instead, it computes in tiles using fast GPU SRAM. Here's how it works and why it's 2-4× faster.