1 post found
During inference, the model stores Key and Value vectors for every token. This KV cache is often the biggest memory consumer. Here's the math behind it.