Data Pipelinescomplex complexity

Event-Driven Architecture

Event-driven system architecture with message queues, event sourcing, CQRS, and sagas for complex workflows that need auditability and decoupling.

Components

Considerations

Alternatives

complex

Complexity

Fit

When this blueprint fits

And when to walk away from it

When to use this

You have multiple bounded contexts that need to react to each other without tight coupling, and you need a durable audit trail of every state change. Order processing, supply chain, financial systems, and large fintech back-offices live here.

When NOT to use this

If your domain is mostly CRUD with simple workflows and a single team, event sourcing adds complexity for no payoff. Start with a well-modelled relational schema and graduate to events only when the pain is real.

Architecture

System components

Key building blocks of this architecture, layered from infrastructure up.

Event Bus

Central event routing with guaranteed delivery, ordering guarantees per partition, and dead-letter handling. Kafka is the workhorse for high-throughput environments. RabbitMQ or Redis Streams cover smaller scale with less operational burden. See the event-driven systems playbook.

KafkaRabbitMQRedis StreamsNATS

Event Store

Immutable append-only log of domain events with replay capability and snapshot support. I usually start with PostgreSQL as the event store because it is one less moving part, and graduate to EventStoreDB only when query patterns demand it.

EventStoreDBPostgreSQLKafkaDynamoDB

Command Handlers

Process commands, validate invariants, and emit domain events. The write side of CQRS. Commands are imperatives (PlaceOrder), events are facts (OrderPlaced). Keep handlers small, single-purpose, and idempotent.

CQRSDomain EventsZod ValidationAggregates

Projections

Read models built by consuming the event stream. Multiple projections can exist for the same events optimised for different query patterns: a search index, an analytics warehouse, a real-time dashboard. Rebuildable from the event log when the model changes.

Materialized ViewsElasticsearchClickHouseRedis

Sagas and Process Managers

Long-running business processes that span aggregates or services. Sagas listen for events, decide on next steps, and emit commands with compensating actions for failures. Temporal makes this much easier than building from scratch.

TemporalInngestCustom OrchestrationCompensation

Schema Registry

Versioned event schemas with backwards and forwards compatibility rules. The contract between producers and consumers. Avro or Protobuf for binary efficiency, JSON Schema for simpler tooling.

Confluent Schema RegistryAvroProtobufJSON Schema

Observability

Distributed tracing across the event flow, dead-letter monitoring, and replay tooling for incident response. The hardest part of operating event-driven systems is understanding what happened across a chain of async hops.

OpenTelemetryJaegerDatadog APMReplay UI

Planning

Critical considerations

The things I have learned the hard way and would not skip on the next build.

Event schema evolution requires a versioning strategy from day one. Backwards-compatible additions are cheap, breaking changes need a coordinated migration. Document the rules and enforce in CI.

Eventual consistency affects user experience. Design the UI to acknowledge that a command was accepted, not that the resulting state is visible. Optimistic updates and explicit pending states keep users sane.

Implement idempotency for reliable event processing. Every consumer can receive the same event more than once. Idempotency keys, dedup tables, or naturally idempotent operations are the three patterns.

Operating an event-driven system requires real on-call investment. Stuck consumers, partition lag, and poison messages will wake someone up. Build the runbooks and dashboards before launch.

Want a partner who has shipped this in fintech? Get in touch.

Options

Alternative approaches

Where I would consider a different shape entirely, with the trade-offs spelled out.

Alternative 01

Simple request/response with a relational database for lower-complexity domains. The cheapest correct architecture wins.

Alternative 02

Temporal or Inngest for managed workflow orchestration when you want durable execution without operating Kafka.

Alternative 03

DynamoDB Streams or Postgres logical replication as a lighter-weight event source when you already have the database.

Implementation

Related playbooks

Step-by-step guides for the harder parts of this architecture.

Designing Event-Driven Systems

Event-driven architectures unlock real autonomy between services, and they expose a whole new category of bugs if you do not respect their constraints. This playbook is the design discipline I use: model events as facts, version schemas carefully, choose the right broker, build idempotent consumers, handle ordering and failure, and add the observability that makes async systems debuggable in production.

Read playbook

Production Monitoring & Observability

Observability is not three pillars on a slide, it is the difference between knowing why your system is misbehaving and guessing. This playbook is the monitoring stack I deploy on every production system: error tracking, structured logging, performance metrics, distributed tracing, and the dashboards and alerts that turn raw data into actionable signal without paging everyone at 3 AM.

Read playbook

In practice