Serverless at Scale: Patterns and Pitfalls

Serverless is powerful but has sharp edges at scale. Learn the patterns that work, the antipatterns to avoid, and when to choose different architectures.

What "at scale" actually means

Serverless gets sold as a magic switch: write a function, deploy, scale to infinity. That story is true for the first ten thousand requests a day. It starts cracking somewhere between five hundred and a few thousand requests per second, or when a single workflow needs more than a few seconds of compute, or when your traffic shape stops looking like neat little independent events.

I still reach for serverless first. It is the right default for most APIs, async jobs, and event handlers. But "at scale" is where the platform's defaults stop helping and start hurting. This is a tour of the patterns I keep using and the traps I keep watching out for.

Cold starts are a product decision

Cold starts are not a bug. They are a knob. Every choice I make about runtime, package size, memory allocation, and provisioned concurrency moves that knob.

What I actually do:

Pick a fast runtime. In my experience Node and Go cold-start meaningfully faster than the heaviest options
Trim the deployment artifact. Tree-shake, ship only what runs, prefer the AWS SDK v3 modular packages
Raise memory deliberately. More memory means more CPU, often a faster cold start, and lower wall clock cost
Use provisioned concurrency only on hot paths. It is real money, so I scope it to the handful of functions facing users

For predictable user-facing latency I tend to keep the truly latency-critical surface on a long-running container behind a load balancer and use functions for everything async around it.

Concurrency is your real budget

People talk about request rate. The platform talks about concurrency. They are not the same. A 200 ms function at 500 RPS uses about 100 concurrent executions. A 5 second function at the same rate uses 2,500. The second one will hit account limits long before the first.

I plan capacity in concurrency, not RPS. I set per-function reserved concurrency on anything that talks to a fragile downstream so a runaway feature cannot starve the rest of the platform. I treat the regional concurrency limit as a shared resource and I monitor it.

The downstream is the bottleneck

A serverless function in front of a relational database is the classic foot-gun. The function will scale. The database will not. I have watched a single deploy spin up thousands of concurrent connections in seconds and brick a Postgres instance.

The patterns that work:

A connection proxy that pools and reuses connections. RDS Proxy, PgBouncer, or a similar layer
Aggressive read caching at the edge or in a fast key-value store
Queue-based smoothing for any write pattern that can tolerate seconds of delay
Idempotency keys on every write so retries do not double-charge or double-create

I cover the database side of this in more depth in PostgreSQL for everything.

Async is the secret weapon

The biggest wins I have seen on serverless platforms come from removing things from the request path. Email sends, search index updates, audit logs, analytics, webhooks to third parties: none of these need to block the user.

A queue plus a worker function is almost always the right shape. SQS, SNS, EventBridge, Kafka, whatever your platform offers. I make the queue the contract. Producers do one thing: enqueue. Consumers do one thing: process and ack. Failures land in a dead-letter queue with enough context to replay.

This pattern is half the reason serverless feels cheap. You stop paying for compute that is just waiting on someone else's network.

Observability is not optional

Stateless functions with no shared memory are a debugging nightmare without proper tooling. I will not ship a serverless system without:

Structured JSON logs with a correlation ID propagated across every hop
Distributed tracing on the critical paths
Custom metrics for business events, not just infrastructure
Alarms tied to user-visible symptoms, not internal noise

Earlier in my career working on regulated systems, I learned the hard way that "the function ran" is not the same as "the work succeeded." I instrument outcomes, not invocations.

When functions stop being enough

There is a point where the right answer is to leave functions behind for that specific workload. Signs I watch for:

Steady traffic with no real spikes (a container is cheaper)
Long-running streaming or websocket connections
Heavy startup work that cannot be amortized
Workloads needing GPU or large memory footprints
A handful of "hot" functions consuming most of the cost

The honest answer is usually a hybrid. Containers behind a load balancer for the steady core, functions for the spiky edges and async work. I help teams make this call as part of architecture work.

A short antipattern list

Calling Lambda from Lambda from Lambda. Use Step Functions or a queue
Storing state in /tmp and praying for warm starts
Hammering a single Postgres without a proxy
Treating timeouts as the only failure mode (think about throttling, partial failures, retries)
Skipping local emulation. Test the function logic the same way you test any other unit

The takeaway

Serverless at scale is mostly about respecting boundaries. Concurrency budgets, downstream limits, observability, and knowing when a different shape fits the workload. Done well, you get an architecture that absorbs spikes, costs less than you'd expect, and lets a small team operate a system that used to need a platform crew. Done poorly, you get the same incidents as before, just in a more expensive package.

Serverless at Scale: Patterns and Pitfalls

What "at scale" actually means

Cold starts are a product decision

Concurrency is your real budget

The downstream is the bottleneck

Async is the secret weapon

Observability is not optional

When functions stop being enough

A short antipattern list

The takeaway

References

Next.js Performance Deep Dive

More in technical

Building Production RAG Systems

An LLM Evaluation Framework That Works

Prompt Engineering for Production

Want to discuss this topic?