1 post found
Total GPU memory = model weights + KV cache + activations + workspace. Here's the exact formula to compute maximum context length for any GPU configuration.