All Insights
technical· 7 min read

The Real Cost of Running Claude in Production

Token economics matter more when you're shipping.

SV
Sri VardhanDecember 15, 2024
Share on Twitter
Share on LinkedIn
Copy link

I track every penny of Claude API spend across my projects. Here's the breakdown of what costs what, and where the savings are.

I run Claude in production for several clients. Here's what costs add up.

Cost components

For a typical RAG-style chat product:

  • System prompt: 2-4K tokens, mostly cached
  • Retrieved context: 5-30K tokens, partially cached
  • User message: 50-300 tokens, never cached
  • Response: 200-2K tokens, never cached

Without caching, every request pays for the full input. With caching, only the new bits.

The caching multiplier

Anthropic's prompt caching cuts cached-token cost by ~10x. The trick: keep the prefix of your prompt stable. A single timestamp in the cached prefix invalidates the cache.

For a chat product with stable system prompt + stable retrieval scaffolding + variable user messages, caching saves 50-80% on token cost.

My standard cost model

For a customer-support chat agent on Claude Sonnet 4.6:

  • 8K-token system prompt (cached)
  • 12K-token retrieved context (cached for the session)
  • 200-token user message
  • 600-token response

Per turn (with cache): ~$0.004. Per 1000 turns: $4.

Compared to a naively-implemented version (no caching): $0.024 per turn. 6x more expensive.

Where the surprises are

  • Long context degrades cost. Filling 1M tokens is expensive even with caching. Use it only when it earns its keep.
  • Tool use multiplies turns. Each tool call is a round-trip. An agent that uses 5 tools per task is 5x the turns of a single-shot generation.
  • Streaming doesn't save cost. Same token count, different delivery.

What I tell clients

Budget for AI cost like you budget for AWS. Track it daily. Set alerts at 50%, 80%, 100% of budget. Make a person responsible for the bill.

Also: token cost will probably halve again in 12 months. Don't over-engineer for token efficiency at the cost of code quality. Sometimes "expensive" is the right answer for now.

References

Tagged

#claude#anthropic#cost#ai
SV

Sri Vardhan

Independent technology studio of one. I help founders and small teams ship serious software without the consultancy overhead. More about me.

Want to discuss this topic?

I am always happy to dig deeper. If a piece sparked an idea or a disagreement, send it over. I read every message myself.

Get in Touch