Vercel Fluid Compute: Six Months of Real Use
A serverless runtime that doesn't penalize you for slow upstream calls.
Fluid compute changes the cost model for serverless functions that mostly wait on databases or AI APIs. I've been running production workloads on it. Here's what's true.
Vercel Fluid is the most interesting serverless runtime change I've seen in years.
The classic problem
Lambda-style serverless bills you for the wall-clock time your function holds memory. If your function spends 80% of its time waiting on Postgres or OpenAI, you're paying for that wait. The cost-per-request economics fall apart for AI workloads where 90% of latency is the upstream model.
What Fluid does
It multiplexes. While your function is awaiting an external call, Vercel fills the runtime with another request. You're billed only for the active CPU time, not the wall-clock.
Real numbers
For a chat backend I run:
- Pre-Fluid: $48 / 1M requests
- Post-Fluid: $11 / 1M requests
That's not a typo. The cost dropped 4x because the workload is dominated by waiting.
Trade-offs
- Cold starts are slightly worse because the runtime is heavier. Mitigated by keeping a small warm pool.
- Memory pressure is real if your concurrent requests have huge buffers. I had a bug where a request held a 50MB doc in memory while waiting on Claude. Three concurrent requests OOM'd the runtime. Fix: stream rather than buffer.
- Observability is different. Traditional "p95 function duration" is misleading; you want "p95 CPU time" instead.
Where it shines
- AI / LLM endpoints (90% wait time)
- Webhook handlers that call third-party APIs
- ISR fetchers that hit slow upstreams
Where to be cautious
- CPU-bound work - no benefit
- Large-payload endpoints - memory pressure
- Anything that needs strict single-request isolation for compliance reasons
If you're on Vercel and have AI workloads, switching to Fluid is the highest-ROI move you can make this quarter.