1 post found
Similar to Kaplan's scaling laws for model size, there are scaling laws for context length. Performance doesn't scale linearly — here's the math.