Posts tagged information theory

Mar 31, 2026 information theory Shannon channel capacity

The Information-Theoretic Limits of Context Windows

There are fundamental limits on how much information a fixed-width attention mechanism can extract from n tokens. Here's the math from Shannon's channel capacity to attention bounds.

Mar 31, 2026 rate-distortion context compression Shannon

The Rate-Distortion Theory of Context Compression

Context compaction is a lossy compression problem. Rate-distortion theory gives the theoretical lower bound on how much conversation history can be compressed.

Mar 31, 2026 softmax attention dilution context rot

Softmax Attention and the Dilution Problem: The Math Behind Context Rot

As context grows, softmax normalizes attention weights so each relevant token gets less attention. This mathematical property is why AI accuracy drops with length.