Posts tagged sparse attention

Sparse Attention Architectures: Breaking O(n²) with O(n√n), O(n·log n), and O(n)

Multiple research approaches attack the quadratic bottleneck: Longformer, Reformer, Linformer, and linear attention. Here's the math behind each one.