AI Summer 2025: What Actually Progressed

Mid-year stocktake on AI capabilities, costs, tools, and architectures. The real progress isn't where the headlines are.

Mid-2025 stocktake. What's actually moved versus what's marketing noise.

Real progress

Coding capability. Claude 4 and GPT-4 successors are genuinely better at multi-file refactoring than they were last year. The gap between "AI assistant for code" and "AI agent for code" closed considerably.

Long context that works. Million-token contexts no longer degrade gracefully - they actually work. This changes RAG architecture (covered in another post).

Tool use reliability. Models now produce well-formed tool calls 95%+ of the time, even on edge cases. Production agents are viable.

Costs collapsed. Inference costs per token dropped 4-8x year-over-year for frontier capability. What was $1000/day in 2024 is now $150/day for the same workload.

Stalled or hype

AGI. Talking heads keep predicting it. Working engineers know we're nowhere near. We don't have agents that can run autonomously for a week without supervision. We don't have models that can debug their own code reliably.

Domain-specific small models. Promised every year. Still no compelling case for most teams over "use a frontier model with a good prompt."

Multi-agent systems. Lots of papers, few production deployments. The orchestration overhead doesn't pay off for most use cases yet.

What I'm watching for the rest of 2025

Whether reasoning models (o1-style, Claude with extended thinking) become cheap enough for routine use
Open-weights models catching up enough that running locally becomes viable
Whether "memory" finally gets a standard architecture rather than every team rolling their own
Cost per intelligence-unit - the right metric to track as the field matures

Practical advice

If you're shipping product:

Bet on frontier models + good prompts. Don't fine-tune unless you have a hard reason to.
Assume costs drop 2x in 12 months. Don't over-engineer for token efficiency yet.
Build provider-agnostic code. Models leapfrog each other.
Invest in eval harnesses. They survive every model change.

The era of "this will all be solved by GPT-5" thinking is over. The work has shifted to engineering: how do you compose capable-but-imperfect models into reliable products?

AI Summer 2025: What Actually Progressed

Real progress

Stalled or hype

What I'm watching for the rest of 2025

Practical advice

References

Related Articles

Building in Public: An Honest Take

Taste in Engineering

AI Won't Replace Engineers (But It Will Change Us)

Want to discuss this topic?

AI Summer 2025: What Actually Progressed

Real progress

Stalled or hype

What I'm watching for the rest of 2025

Practical advice

References

Related Articles

Building in Public: An Honest Take

Taste in Engineering

AI Won't Replace Engineers (But It Will Change Us)

Want to discuss this topic?

Command Palette