A practical guide to building real-time features-from simple presence indicators to complex multiplayer experiences.
Real-time is a spectrum, not a feature
When a founder asks for "real-time," I always ask back: real-time as in instant chat, or real-time as in the dashboard updates within a few seconds? The answer changes the architecture entirely. Treating those two as the same problem is how you end up with a multiplayer Google-Docs-style backend powering a stock ticker that could have been a polling endpoint.
Most "real-time" features I have shipped did not need millisecond latency. They needed fresh enough data and a UI that did not feel stale. Picking the right pattern starts with admitting that.
The four moves I keep reaching for
Across the projects I have worked on, almost every real-time feature uses one of four patterns:
- Polling with a short interval
- Server-Sent Events for one-way streams
- WebSockets for bidirectional, conversational connections
- A managed real-time platform (Pusher, Ably, Supabase Realtime, Liveblocks, Partykit) when I would rather buy than build
There is no shame in any of these. Polling, in particular, is dramatically underrated. A 5-second interval on a small JSON endpoint behind a CDN is cheap, simple, observable, and almost always good enough for "live status" UIs.
Server-Sent Events: the underused middle ground
For one-way feeds (notifications, progress bars, live logs, AI streaming responses) Server-Sent Events are usually my pick. They run over plain HTTP, work through every load balancer and proxy, reconnect automatically with the EventSource API, and skip the entire WebSocket upgrade handshake.
I default to SSE for any case where the server is the only one that needs to push. Streaming an LLM response token by token, surfacing background job progress, broadcasting alerts to a dashboard. The serverless platforms have caught up here too: Next.js streaming and Vercel Edge functions both support SSE cleanly.
WebSockets: when conversation is the point
I reach for WebSockets when both sides talk. Multiplayer cursors, collaborative documents, live chat, low-latency control panels, multiplayer games. The defining shape: each client maintains a stateful connection and sends messages frequently.
The hard parts of WebSockets are not the protocol. They are operational:
- Stickiness. A client connection lives on a single server, so the load balancer must route reconnects appropriately
- Backpressure. Slow consumers can fill server buffers if you are not careful
- Auth. The handshake happens once, so token rotation needs explicit handling
- Scale-out. Two users in the same room may land on different servers, so you need a pub/sub layer between them
In my experience, this last one is what eats teams alive. The first version uses an in-memory map of connections. It works on one process. The day you add a second instance, half the room goes silent. Plan for the pub/sub layer (Redis, NATS, Kafka) on day one even if you only have one server today.
Buy versus build
I am increasingly happy to outsource the transport layer. Managed real-time platforms have matured. They handle presence, history, fanout, and reconnection logic, which is most of the work. The teams I see succeed with build-it-yourself usually have either a real differentiator (a CRDT engine, a game engine) or a constraint that forces it (data residency, cost at extreme scale).
For most products, paying a vendor a few hundred dollars a month to never debug a websocket reconnect storm is a great trade. I cover similar buy/build calls in choosing a tech stack.
Designing the message layer
Whichever transport I pick, I treat messages like an API. Versioned. Typed. Documented.
What I always do:
- Define a small set of event types with clear payloads
- Include a
typediscriminator so clients can switch on it cleanly - Use idempotency keys for any event that might be retried
- Include enough context that a client can reconstruct state from a single late event, not just a delta
Without these habits the protocol turns into a tangle of edge cases within a quarter.
State, presence, and the source of truth
Real-time UIs make it tempting to treat the websocket as the source of truth. That always ends badly. The database is the source of truth. The websocket is a notification channel.
The pattern I follow:
- The client makes a regular write to the API
- The API persists the change
- The API publishes an event to the message bus
- The bus fans out to all connected clients in the relevant room
- Clients update local state from the event, falling back to a re-fetch when in doubt
For presence (who is online, where their cursor is) I store ephemeral state separately, often in Redis with short TTLs, and accept that it is approximate.
Failure modes worth rehearsing
Real-time systems fail in ways stateless ones do not. I always test:
- Reconnection storms after a server restart
- Client clock skew breaking ordering
- Network changes (Wi-Fi to cellular) dropping connections silently
- Backpressure when a single client is slow
- Replays of duplicated events
If you have not deliberately broken your system in each of these ways before launch, your users will do it for you.
The point
Real-time is more about discipline than novelty. Pick the simplest transport that fits the workload, treat the wire format as a versioned API, plan for multi-instance fanout from the start, and let a managed platform handle the parts that are not your differentiator. Done well, real-time features feel magical and stay boring to operate. Done poorly, they become the most expensive part of the system to keep alive.
If you want help working through these tradeoffs on a specific feature, that is exactly the kind of architecture work I take on.
References
Tagged
Sri Vardhan
Independent technology studio of one. I help founders and small teams ship serious software without the consultancy overhead. More about me.