Why Vector Search and GraphRAG Both End in the Same Context Rot
On paper, vector search and GraphRAG are rivals. One matches by similarity, the other follows real edges. People argue about which is better the way they argue about editors. But if you actually trace what each one does from start to finish, they converge on the same last move, and that last move is where both of them quietly fail.
Here is the move. Parse the code. Retrieve some of it. Rerank what you retrieved. Stuff it into the context window. Hand the window to the model and hope.
Vector search and GraphRAG disagree about the retrieve step. They are identical on everything after it. And the failure we are talking about, context rot, does not live in the retrieve step. It lives in the stuff-the-window step, which means swapping one retriever for another does not fix it. You can have the best retrieval in the world and still rot, because the rot is downstream of retrieval entirely.
The pipeline both of them run
Strip the branding off and almost every code context tool is the same pipeline.
You parse the codebase into something searchable. You take the user’s question and pull back a set of candidate chunks, either by embedding similarity or by walking a graph. You rerank those candidates so the most promising ones go first. Then you paste them into the prompt, up to whatever the window allows, and you let the model read the pile and answer.
GraphRAG makes the retrieve step smarter. Instead of grabbing whatever looks similar, it follows call edges and gets chunks that are actually connected to the question. That is a real improvement and we have written about why. But notice what did not change. After the graph picks better chunks, it still does the exact same thing vector search does. It dumps those chunks into the window and asks the model to make sense of them. The representation handed to the model is still a pile of raw code fragments. A better-chosen pile is still a pile.
What context rot actually is
Context rot is the well documented fact that a model does not use a long context evenly. The more you put in the window, the worse it gets at using any particular piece of it. This is not a vibe. It is measured. The lost in the middle research showed that models retrieve information placed in the middle of a long context far worse than information at the start or the end, with something like 10 to 25% accuracy degradation for the middle depending on the model, and larger windows showing more degradation, not less.
So picture what happens at the end of the pipeline. The retriever, vector or graph, hands over a dozen or two dozen code chunks. They go into the window in some order. The model reads strongly at the top, strongly at the bottom, and weakly through the long middle where half your chunks now live. The one chunk that actually mattered might be chunk number nine of twenty, sitting right in the dead zone. It was retrieved correctly. It was placed in the window. And the model still half ignored it. That is context rot, and it happened after retrieval did its job perfectly.
This is why teams keep being disappointed when they upgrade their retriever. They move from vector to graph, retrieval precision goes up, and the end to end accuracy moves less than they expected. The retriever was never the whole problem. The pile-in-the-window step has its own ceiling, and both retrievers slam into it.
Why a smarter pile is still a pile
You might think the answer is just better reranking, or a bigger window, or more aggressive trimming. Those help at the margin and none of them change the shape of the problem.
A bigger window is the opposite of a fix, because degradation grows with length, so more room to stuff means more middle to lose. Better reranking helps you order the pile, but the model still has to read a heap of disconnected fragments and reconstruct, on the fly, every session, how they fit together. That reconstruction is the expensive, error prone, token hungry part. You are asking the model to re-derive the structure of your codebase from a shuffled handful of its pieces, over and over, on every question. Even when the right pieces are present, the act of reassembling them in the window is where hallucination creeps in, because a missing connective fragment gets filled with a plausible guess.
That is the real shape of it. Vector search gives the model a pile chosen by resemblance. GraphRAG gives it a pile chosen by connection. Both then make the model do the reassembly inside the window, every time, and that is the step that rots.
The way out is to not stuff a pile at all
The fix is not a better retriever feeding the same window. The fix is to change what you hand the model in the first place.
Instead of retrieving raw fragments and asking the model to reconstruct meaning from them live, you do the understanding once, ahead of time, and you store the result as a representation that already holds the structure and the intent. Then at question time the model is not handed twenty disconnected chunks to reassemble. It is handed an already-assembled slice of meaning, with the connections drawn and the purpose attached. There is far less for the window to rot, because you are not making the model rebuild the codebase on every query. You did the rebuild once, at index time, and it persists.
This is the LLM compiler pattern. You run a model across the codebase a single time and derive from each file a verifiable intermediate representation that captures what it does, why it exists, and how it connects across repositories. That representation is the thing you serve, not a fresh pile of grep results. The model spends its window reasoning over meaning instead of spending it re-deriving structure, and the lost in the middle problem shrinks because there is simply less raw material competing for attention in the window.
It is the same insight as caching, pointed at the right layer. Every tool today caches files so it does not re-read them from disk. That does nothing for context rot, because the model still has to re-understand those files in the window. What you want to cache is the understanding, so the model never re-derives it at all.
What this looks like in practice
This is what ByteBell does. We are the verifiable context layer for code. We compile your repositories once into a verifiable code IR, on your own infrastructure through Docker, at a few dollars per thousand files. The IR is a derived contract, not a perfect copy of your code. It captures how every file and repository connects and what each part is meant to do, and every agent change gets checked against it before it lands, so drift and hallucinated dependencies are caught instead of shipped. From then on every engineer queries that representation over a single MCP url, on any copilot. The model is handed a coherent, connected slice of meaning rather than a window stuffed with reranked fragments it has to reassemble.
The measurements line up with the theory. On 46 Kubernetes ecosystem repositories and 150,000 files, ByteBell hit about 10% higher accuracy at 70% lower cost, on roughly a fifth of the tokens, with responses around 70% faster, and it finished the cross repository tasks where the stuff-the-window approaches could not even assemble enough coherent context to complete. Average cost per task actually fell, from 0.22, because the model stopped burning tokens exploring and reassembling and went closer to straight to the answer.
So the next time someone frames the choice as vector versus graph, notice that the framing hides the real problem. Both end the same way, by stuffing a pile into a window that rots. The question that actually matters is whether you make the model reconstruct your codebase on every query, or whether you compile that understanding once and hand it over already built. Change the last step, not just the retriever, and the rot has nothing left to feed on.
This is ByteBell. We build the verifiable context layer for code: a derived IR that captures how your repositories connect and what each part is meant to do, served to every tool over one MCP url, with every change checked against intent so drift gets caught instead of stuffed into a window. It runs on your own infrastructure, and your code never leaves it. As machines write code faster than any team can read it, the only thing worth trusting is a clear statement of what the system must do, with continuous proof it does that and nothing more.