All Insights
essays· 9 min read

AI Summer 2025: What Actually Progressed

Past the hype cycles, what's a working engineer to make of this year?

SV
Sri VardhanJune 30, 2025
Share on Twitter
Share on LinkedIn
Copy link

Mid-year stocktake on AI capabilities, costs, tools, and architectures. The real progress isn't where the headlines are.

Mid-2025 stocktake. What's actually moved versus what's marketing noise.

Real progress

Coding capability. Claude 4 and GPT-4 successors are genuinely better at multi-file refactoring than they were last year. The gap between "AI assistant for code" and "AI agent for code" closed considerably.

Long context that works. Million-token contexts no longer degrade gracefully - they actually work. This changes RAG architecture (covered in another post).

Tool use reliability. Models now produce well-formed tool calls 95%+ of the time, even on edge cases. Production agents are viable.

Costs collapsed. Inference costs per token dropped 4-8x year-over-year for frontier capability. What was $1000/day in 2024 is now $150/day for the same workload.

Stalled or hype

AGI. Talking heads keep predicting it. Working engineers know we're nowhere near. We don't have agents that can run autonomously for a week without supervision. We don't have models that can debug their own code reliably.

Domain-specific small models. Promised every year. Still no compelling case for most teams over "use a frontier model with a good prompt."

Multi-agent systems. Lots of papers, few production deployments. The orchestration overhead doesn't pay off for most use cases yet.

What I'm watching for the rest of 2025

  • Whether reasoning models (o1-style, Claude with extended thinking) become cheap enough for routine use
  • Open-weights models catching up enough that running locally becomes viable
  • Whether "memory" finally gets a standard architecture rather than every team rolling their own
  • Cost per intelligence-unit - the right metric to track as the field matures

Practical advice

If you're shipping product:

  • Bet on frontier models + good prompts. Don't fine-tune unless you have a hard reason to.
  • Assume costs drop 2x in 12 months. Don't over-engineer for token efficiency yet.
  • Build provider-agnostic code. Models leapfrog each other.
  • Invest in eval harnesses. They survive every model change.

The era of "this will all be solved by GPT-5" thinking is over. The work has shifted to engineering: how do you compose capable-but-imperfect models into reliable products?

References

Tagged

#ai#industry#trends
SV

Sri Vardhan

Independent technology studio of one. I help founders and small teams ship serious software without the consultancy overhead. More about me.

Want to discuss this topic?

I am always happy to dig deeper. If a piece sparked an idea or a disagreement, send it over. I read every message myself.

Get in Touch