"RAG is Dead" - and Why That's Half True
Long context killed naive RAG. It didn't kill retrieval.
Every six months someone declares RAG dead. They're wrong, but the way RAG is implemented in most production systems IS dead. Here's the difference.
Twitter loves a death announcement. "RAG is dead" trends every other month. Most of these takes are wrong. Some are right.
What's actually dead
The 2023-style RAG: chunk a document at 500 tokens, embed every chunk with text-embedding-ada-002, top-5 retrieval into GPT-3.5. That stack is no longer competitive.
It loses to:
- Just shoving the whole document into a 1M context window (for small corpora)
- Hybrid retrieval with re-ranking and 200K context (for large corpora)
- Agentic retrieval where the model issues its own queries (for complex/multi-hop)
If your production RAG pipeline still looks like a 2023 tutorial, it's worth a rebuild.
What's not dead
Retrieval itself. The fundamental insight that you should put relevant tokens in front of the model is more true than ever. What changes is how you decide what's "relevant" and how much you can afford to include.
I now think of retrieval as a budget allocation problem:
- I have a context budget (1M tokens, or 200K, or 32K - depends on the model)
- I have a latency budget
- I have a cost budget (don't forget caching!)
- Retrieval's job is to maximize relevance per token within those constraints
The 2023 RAG mindset was "retrieve the smallest relevant set." The 2026 mindset is "retrieve the largest relevant set you can afford."
Three patterns I use now
- Slop retrieval + long context. For most internal tools, dump 50-100 chunks into Claude Opus 4.7 and let the model do the relevance filtering itself. Cheaper than a re-ranker if you cache.
- Agentic search. For multi-hop questions, give the model a search tool and let it run 3-5 queries. This wins decisively on questions like "find the contract clause that contradicts the policy in section 4."
- Hierarchical summarization. For corpora bigger than any context window, summarize at multiple levels (paragraph → section → document → corpus) and search at each level. This is how you'd build a 50K-document legal-review system today.
The mistake everyone makes
People treat RAG as one architecture. It's not. RAG is a family of architectures with different trade-offs. The right one for a customer-support bot is different from the right one for a code-search assistant is different from the right one for a clinical-decision-support tool.
When I advise clients, I always ask: what's your worst-case failure mode? If a wrong answer is "annoying", you can be loose. If a wrong answer is "we get sued," you need citations and confidence scoring and human review.
The takeaway
Don't believe the death announcements. Read them as "the simple version of this technique no longer wins." That's almost always true in fast-moving fields. The general technique gets refined; it doesn't disappear.