Vercel Fluid Compute: Six Months of Real Use

Fluid compute changes the cost model for serverless functions that mostly wait on databases or AI APIs. I've been running production workloads on it. Here's what's true.

Vercel Fluid is the most interesting serverless runtime change I've seen in years.

The classic problem

Lambda-style serverless bills you for the wall-clock time your function holds memory. If your function spends 80% of its time waiting on Postgres or OpenAI, you're paying for that wait. The cost-per-request economics fall apart for AI workloads where 90% of latency is the upstream model.

What Fluid does

It multiplexes. While your function is awaiting an external call, Vercel fills the runtime with another request. You're billed only for the active CPU time, not the wall-clock.

Real numbers

For a chat backend I run:

Pre-Fluid: $48 / 1M requests
Post-Fluid: $11 / 1M requests

That's not a typo. The cost dropped 4x because the workload is dominated by waiting.

Trade-offs

Cold starts are slightly worse because the runtime is heavier. Mitigated by keeping a small warm pool.
Memory pressure is real if your concurrent requests have huge buffers. I had a bug where a request held a 50MB doc in memory while waiting on Claude. Three concurrent requests OOM'd the runtime. Fix: stream rather than buffer.
Observability is different. Traditional "p95 function duration" is misleading; you want "p95 CPU time" instead.

Where it shines

AI / LLM endpoints (90% wait time)
Webhook handlers that call third-party APIs
ISR fetchers that hit slow upstreams

Where to be cautious

CPU-bound work - no benefit
Large-payload endpoints - memory pressure
Anything that needs strict single-request isolation for compliance reasons

If you're on Vercel and have AI workloads, switching to Fluid is the highest-ROI move you can make this quarter.

Vercel Fluid Compute: Six Months of Real Use

The classic problem

What Fluid does

Real numbers

Trade-offs

Where it shines

Where to be cautious

References

Related Articles

Building Production RAG Systems

An LLM Evaluation Framework That Works

Prompt Engineering for Production

Want to discuss this topic?

Vercel Fluid Compute: Six Months of Real Use

The classic problem

What Fluid does

Real numbers

Trade-offs

Where it shines

Where to be cautious

References

Related Articles

Building Production RAG Systems

An LLM Evaluation Framework That Works

Prompt Engineering for Production

Want to discuss this topic?

Command Palette