All Insights
technical

The Real Cost of Running Claude in Production

Token economics matter more when you're shipping.

December 15, 20247 min read

I track every penny of Claude API spend across my projects. Here's the breakdown of what costs what, and where the savings are.

I run Claude in production for several clients. Here's what costs add up.

Cost components

For a typical RAG-style chat product:

  • System prompt: 2-4K tokens, mostly cached
  • Retrieved context: 5-30K tokens, partially cached
  • User message: 50-300 tokens, never cached
  • Response: 200-2K tokens, never cached

Without caching, every request pays for the full input. With caching, only the new bits.

The caching multiplier

Anthropic's prompt caching cuts cached-token cost by ~10x. The trick: keep the prefix of your prompt stable. A single timestamp in the cached prefix invalidates the cache.

For a chat product with stable system prompt + stable retrieval scaffolding + variable user messages, caching saves 50-80% on token cost.

My standard cost model

For a customer-support chat agent on Claude Sonnet 4.6:

  • 8K-token system prompt (cached)
  • 12K-token retrieved context (cached for the session)
  • 200-token user message
  • 600-token response

Per turn (with cache): ~$0.004. Per 1000 turns: $4.

Compared to a naively-implemented version (no caching): $0.024 per turn. 6x more expensive.

Where the surprises are

  • Long context degrades cost. Filling 1M tokens is expensive even with caching. Use it only when it earns its keep.
  • Tool use multiplies turns. Each tool call is a round-trip. An agent that uses 5 tools per task is 5x the turns of a single-shot generation.
  • Streaming doesn't save cost. Same token count, different delivery.

What I tell clients

Budget for AI cost like you budget for AWS. Track it daily. Set alerts at 50%, 80%, 100% of budget. Make a person responsible for the bill.

Also: token cost will probably halve again in 12 months. Don't over-engineer for token efficiency at the cost of code quality. Sometimes "expensive" is the right answer for now.

References

claudeanthropiccostai

Want to discuss this topic?

I'm always happy to dive deeper. Reach out if you have questions or want to collaborate.

Get in Touch

Command Palette

Search for a command to run...