The Real Cost of Running Claude in Production

I track every penny of Claude API spend across my projects. Here's the breakdown of what costs what, and where the savings are.

I run Claude in production for several clients. Here's what costs add up.

Cost components

For a typical RAG-style chat product:

System prompt: 2-4K tokens, mostly cached
Retrieved context: 5-30K tokens, partially cached
User message: 50-300 tokens, never cached
Response: 200-2K tokens, never cached

Without caching, every request pays for the full input. With caching, only the new bits.

The caching multiplier

Anthropic's prompt caching cuts cached-token cost by ~10x. The trick: keep the prefix of your prompt stable. A single timestamp in the cached prefix invalidates the cache.

For a chat product with stable system prompt + stable retrieval scaffolding + variable user messages, caching saves 50-80% on token cost.

My standard cost model

For a customer-support chat agent on Claude Sonnet 4.6:

8K-token system prompt (cached)
12K-token retrieved context (cached for the session)
200-token user message
600-token response

Per turn (with cache): ~$0.004. Per 1000 turns: $4.

Compared to a naively-implemented version (no caching): $0.024 per turn. 6x more expensive.

Where the surprises are

Long context degrades cost. Filling 1M tokens is expensive even with caching. Use it only when it earns its keep.
Tool use multiplies turns. Each tool call is a round-trip. An agent that uses 5 tools per task is 5x the turns of a single-shot generation.
Streaming doesn't save cost. Same token count, different delivery.

What I tell clients

Budget for AI cost like you budget for AWS. Track it daily. Set alerts at 50%, 80%, 100% of budget. Make a person responsible for the bill.

Also: token cost will probably halve again in 12 months. Don't over-engineer for token efficiency at the cost of code quality. Sometimes "expensive" is the right answer for now.

The Real Cost of Running Claude in Production

Cost components

The caching multiplier

My standard cost model

Where the surprises are

What I tell clients

References

Related Articles

Building Production RAG Systems

An LLM Evaluation Framework That Works

Prompt Engineering for Production

Want to discuss this topic?

The Real Cost of Running Claude in Production

Cost components

The caching multiplier

My standard cost model

Where the surprises are

What I tell clients

References

Related Articles

Building Production RAG Systems

An LLM Evaluation Framework That Works

Prompt Engineering for Production

Want to discuss this topic?

Command Palette