Real-Time Architecture Patterns

A practical guide to building real-time features-from simple presence indicators to complex multiplayer experiences.

Real-time is a spectrum, not a feature

When a founder asks for "real-time," I always ask back: real-time as in instant chat, or real-time as in the dashboard updates within a few seconds? The answer changes the architecture entirely. Treating those two as the same problem is how you end up with a multiplayer Google-Docs-style backend powering a stock ticker that could have been a polling endpoint.

Most "real-time" features I have shipped did not need millisecond latency. They needed fresh enough data and a UI that did not feel stale. Picking the right pattern starts with admitting that.

The four moves I keep reaching for

Across the projects I have worked on, almost every real-time feature uses one of four patterns:

Polling with a short interval
Server-Sent Events for one-way streams
WebSockets for bidirectional, conversational connections
A managed real-time platform (Pusher, Ably, Supabase Realtime, Liveblocks, Partykit) when I would rather buy than build

There is no shame in any of these. Polling, in particular, is dramatically underrated. A 5-second interval on a small JSON endpoint behind a CDN is cheap, simple, observable, and almost always good enough for "live status" UIs.

Server-Sent Events: the underused middle ground

For one-way feeds (notifications, progress bars, live logs, AI streaming responses) Server-Sent Events are usually my pick. They run over plain HTTP, work through every load balancer and proxy, reconnect automatically with the EventSource API, and skip the entire WebSocket upgrade handshake.

I default to SSE for any case where the server is the only one that needs to push. Streaming an LLM response token by token, surfacing background job progress, broadcasting alerts to a dashboard. The serverless platforms have caught up here too: Next.js streaming and Vercel Edge functions both support SSE cleanly.

WebSockets: when conversation is the point

I reach for WebSockets when both sides talk. Multiplayer cursors, collaborative documents, live chat, low-latency control panels, multiplayer games. The defining shape: each client maintains a stateful connection and sends messages frequently.

The hard parts of WebSockets are not the protocol. They are operational:

Stickiness. A client connection lives on a single server, so the load balancer must route reconnects appropriately
Backpressure. Slow consumers can fill server buffers if you are not careful
Auth. The handshake happens once, so token rotation needs explicit handling
Scale-out. Two users in the same room may land on different servers, so you need a pub/sub layer between them

In my experience, this last one is what eats teams alive. The first version uses an in-memory map of connections. It works on one process. The day you add a second instance, half the room goes silent. Plan for the pub/sub layer (Redis, NATS, Kafka) on day one even if you only have one server today.

Buy versus build

I am increasingly happy to outsource the transport layer. Managed real-time platforms have matured. They handle presence, history, fanout, and reconnection logic, which is most of the work. The teams I see succeed with build-it-yourself usually have either a real differentiator (a CRDT engine, a game engine) or a constraint that forces it (data residency, cost at extreme scale).

For most products, paying a vendor a few hundred dollars a month to never debug a websocket reconnect storm is a great trade. I cover similar buy/build calls in choosing a tech stack.

Designing the message layer

Whichever transport I pick, I treat messages like an API. Versioned. Typed. Documented.

What I always do:

Define a small set of event types with clear payloads
Include a type discriminator so clients can switch on it cleanly
Use idempotency keys for any event that might be retried
Include enough context that a client can reconstruct state from a single late event, not just a delta

Without these habits the protocol turns into a tangle of edge cases within a quarter.

State, presence, and the source of truth

Real-time UIs make it tempting to treat the websocket as the source of truth. That always ends badly. The database is the source of truth. The websocket is a notification channel.

The pattern I follow:

The client makes a regular write to the API
The API persists the change
The API publishes an event to the message bus
The bus fans out to all connected clients in the relevant room
Clients update local state from the event, falling back to a re-fetch when in doubt

For presence (who is online, where their cursor is) I store ephemeral state separately, often in Redis with short TTLs, and accept that it is approximate.

Failure modes worth rehearsing

Real-time systems fail in ways stateless ones do not. I always test:

Reconnection storms after a server restart
Client clock skew breaking ordering
Network changes (Wi-Fi to cellular) dropping connections silently
Backpressure when a single client is slow
Replays of duplicated events

If you have not deliberately broken your system in each of these ways before launch, your users will do it for you.

The point

Real-time is more about discipline than novelty. Pick the simplest transport that fits the workload, treat the wire format as a versioned API, plan for multi-instance fanout from the start, and let a managed platform handle the parts that are not your differentiator. Done well, real-time features feel magical and stay boring to operate. Done poorly, they become the most expensive part of the system to keep alive.

If you want help working through these tradeoffs on a specific feature, that is exactly the kind of architecture work I take on.

Real-Time Architecture Patterns

Real-time is a spectrum, not a feature

The four moves I keep reaching for

Server-Sent Events: the underused middle ground

WebSockets: when conversation is the point

Buy versus build

Designing the message layer

State, presence, and the source of truth

Failure modes worth rehearsing

The point

References

TypeScript Patterns for Production

More in technical

Building Production RAG Systems

An LLM Evaluation Framework That Works

Prompt Engineering for Production

Want to discuss this topic?