1 post found
When a single GPU can't hold the KV cache, you distribute the sequence across multiple GPUs. Here's how ring attention enables million-token contexts.