All blueprints
Real-time Systemscomplex complexity

Real-time Chat at Scale

Architecture for chat systems handling millions of concurrent users covering connection management, fanout, persistence, and moderation.

7

Components

5

Considerations

4

Alternatives

complex

Complexity

Fit

When this blueprint fits

And when to walk away from it

When to use this

Chat is the core experience: customer support, community platforms, in-game chat, dating apps, social messengers. The right blueprint when message latency and reliability are competitive differentiators.

When NOT to use this

If chat is a side feature with a few thousand users, embed a managed service like Stream or Sendbird. The build-vs-buy break-even for chat sits somewhere around 100k monthly active users.

Architecture

System components

Key building blocks of this architecture, layered from infrastructure up.

01

Connection Layer

WebSocket gateway with sticky sessions, graceful failover, and connection migration during deploys. I separate the gateway from the application logic so I can scale capacity independently and roll out gateway changes without touching the rest of the system.
WebSocketuWebSocketsLoad BalancerSticky Sessions
02

Message Bus

Pub/sub for fanning messages to subscribers across nodes and across regions. Redis Pub/Sub is fine for single-region. NATS or Kafka are stronger when ordering or durability matter. See Redis vs Memcached.
Redis Pub/SubNATSKafka
03

Presence Service

Online/offline indicators, typing notifications, and last-seen timestamps. Presence is a separate concern from messages: lossy, throttled, and ephemeral. Heartbeats every 30 seconds, presence updates batched and rate-limited.
RedisHeartbeatsThrottlingBloom Filters
04

Persistence

Durable message storage with pagination, search, and tiered retention. Hot messages in PostgreSQL, older messages in object storage with a search index for retrieval. Cassandra works well for very high write throughput.
PostgreSQLCassandraElasticsearchS3
05

Push and Email Fallback

Notify offline users via push and email digests with smart deduplication. A user with the app backgrounded should get a push, the same user idle for an hour should get an email. Channel choice per user, per conversation, per urgency.
FCMAPNsResendTwilio
06

Moderation

Automated and human moderation for user safety with appeal workflows. OpenAI Moderation API plus Perspective API for automated screening, human review for borderline cases. Audit decisions to train better classifiers over time.
OpenAI ModerationPerspective APIQueueReview UI
07

Search and Indexing

Full-text search across messages with tenant isolation, time filtering, and attachment indexing. Elasticsearch or Typesense, indexed asynchronously from the message stream to avoid slowing down the hot path.
ElasticsearchTypesenseAsync Indexing

Planning

Critical considerations

The things I have learned the hard way and would not skip on the next build.

Plan connection capacity per node and pool aggressively. A single node handles around 10k connections cleanly; beyond that, fan out across nodes with consistent hashing.
Design idempotent message delivery with client-side dedup. Every reconnect can cause replays. Server-assigned message IDs and client tracking prevent duplicates.
Implement graceful degradation when the bus or persistence layer is unreachable. Show a clear offline state, queue outbound messages, and reconcile on reconnect.
Build the abuse and moderation primitives before launch. Spam, harassment, and CSAM are inevitable on any chat platform. Reporting, blocking, and rate limits are launch-day features.
Contact me if you are scaling chat past 100k concurrent.

Options

Alternative approaches

Where I would consider a different shape entirely, with the trade-offs spelled out.

Alternative 01
Stream Chat or Sendbird for managed chat with rich client SDKs. The right call below 100k MAU.
Alternative 02
Liveblocks or Ably for real-time primitives when chat is one of several real-time features.
Alternative 03
Matrix for federated chat when interoperability and self-hosting are core requirements.
Alternative 04
PartyKit for edge-native chat with regional fanout and low global latency.
Need a partner on this?

Need help implementing this blueprint?

I help teams adapt blueprints like this to their specific requirements and ship from planning through production.