Guarantee message delivery by writing events to an outbox table in the same transaction.
If you are new here: The Outbox Pattern solves a deceptively common bug: you save a record to your database and try to publish an event to a message queue — but the publish fails silently. The record was saved; the event was never delivered. Downstream services never knew the record existed. The Outbox Pattern fixes this by writing the event into a special outbox table in your database as part of the same transaction that writes the business record. Then a separate relay process reads from the outbox and publishes to the message bus. If the relay fails, it can retry — the event is still in the outbox. If the database transaction fails, neither the record nor the event exists.
| Term | Plain meaning |
|---|---|
| Outbox table | A database table that stores events to be published, written in the same transaction as business data |
| Relay process | A background service that reads from the outbox and publishes events to a message bus |
| Dual-write | Attempting to write to two separate systems (DB and message bus) without atomicity — the anti-pattern this solves |
| At-least-once delivery | A guarantee that every event will be published — but possibly more than once; consumers must be idempotent |
| CDC (Change Data Capture) | Reading the database's write-ahead log to capture changes, rather than polling a table |
| Debezium | A popular open-source CDC tool that tails database WALs and publishes to Kafka |
| Idempotent consumer | A consumer that produces the same result regardless of how many times it processes the same event |
Your Order Service creates a new order and needs to notify the Inventory Service so it can reserve stock. The naive implementation:
await db.query("INSERT INTO orders ..."); // Step 1
await messageQueue.publish("OrderCreated", order); // Step 2This looks reasonable. But consider what happens if step 2 fails:
The order is in your database. The "OrderCreated" event was never published. The Inventory Service has no idea the order exists. Stock is never reserved. The customer sees a confirmed order, but fulfillment is stuck.
Now consider the reverse: what if step 1 fails?
These are real production bugs. Every distributed system that tries to write to a database and publish a message atomically faces this problem — and the naive solution (just do both) fails in all the ways above.
In plain terms: you can't make a database write and a message queue publish atomic without a distributed transaction coordinator — but a distributed transaction coordinator is expensive and complex. The outbox pattern is the simpler, battle-tested alternative.
Analogy: Think of postal mail. Instead of handing your letter directly to the mail carrier and hoping they don't drop it, you put it in your own outbox first. The letter is safe in your possession. When the mail carrier comes, they take it from your outbox and deliver it. If the carrier doesn't show up today, the letter is still in your outbox for tomorrow. You've guaranteed the letter will eventually be delivered without needing to hand it to the carrier atomically.
The Outbox Pattern works by treating the event as just another database row, written in the same transaction as your business data.
Your database transaction now does two things atomically:
outbox table describing the event to be publishedBEGIN;
INSERT INTO orders (id, customer_id, status, ...)
VALUES ($1, $2, 'pending', ...);
INSERT INTO outbox (event_type, payload, created_at, published)
VALUES ('OrderCreated', '{"orderId": "8812", ...}', NOW(), false);
COMMIT;If the transaction commits, both the order and the outbox row exist. If the transaction rolls back (for any reason), neither exists. The atomicity is guaranteed by the database — no distributed transaction needed.
In plain terms: by writing the event to the same database transaction as the business record, you delegate the atomicity problem to your database, which is already very good at it.
Tiny example: Order #8812 is created. In the same transaction:
orders table: new row with id=8812, status=pendingoutbox table: new row with event_type=OrderCreated, payload={"orderId":"8812","customerId":"42","total":129.99}, published=falseTransaction commits. Both rows are permanent. The order exists. The event is ready to be published.
A separate background process — the relay — periodically queries the outbox table for unpublished rows and publishes each one to the message bus.
// Run every few seconds
const events = await db.query(
"SELECT * FROM outbox WHERE published = false ORDER BY created_at LIMIT 100"
);
for (const event of events) {
await messageBus.publish(event.event_type, event.payload);
await db.query("UPDATE outbox SET published = true WHERE id = $1", [event.id]);
}The relay might run as a dedicated microservice, a cron job, or a thread in the main service. The key properties it needs:
Resilience: if the relay crashes after publishing the event but before marking it as published, it will publish the event again on restart. This is "at-least-once" delivery — consumers must be idempotent (running them twice gives the same result as running them once).
Ordering: if ordering matters, the relay should publish events in created_at order and never process the next event until the previous one is confirmed.
Latency: polling introduces lag. If you poll every 5 seconds, events can be up to 5 seconds late. For most workflows this is acceptable. For near-real-time requirements, use CDC instead of polling.
In plain terms: the relay is the mail carrier that empties your outbox every few seconds. It can be slow, it can retry, it can crash — the letter is always safe in the outbox until it's confirmed delivered.
After successfully publishing an event to the message bus, the relay marks it as published (or deletes the row). This prevents re-publishing and keeps the outbox table from growing forever.
The order of operations is critical:
This sequence creates at-least-once delivery: you guarantee the event is published at least once, but potentially more. The alternative — marking published before publishing to the bus — creates at-most-once delivery: if the publish fails after the mark, the event is lost.
In plain terms: publish first, then mark done. A duplicate event is recoverable (idempotent consumer handles it). A lost event is a silent inconsistency.
Concrete sketch: The relay publishes "OrderCreated" to Kafka successfully. Before it can mark the row as published, the relay process crashes. On restart, the relay sees the row is still marked published=false and publishes it again. Kafka delivers it twice to the Inventory Service. The Inventory Service, being idempotent, checks: "Did I already process an OrderCreated for order #8812?" — yes, its reservation already exists. It does nothing. No duplicate reservation. No broken state.
Polling the outbox table works well, but it has overhead: a SELECT query every few seconds plus an UPDATE on each row. For high-throughput systems, this can add noticeable database load.
Change Data Capture (CDC) eliminates polling. Instead of querying the outbox table, a CDC tool like Debezium tails the database's write-ahead log (WAL) — the same binary log your replication replicas use. Every INSERT into the outbox table appears in the WAL, and Debezium reads it and publishes to Kafka in near real-time.
Benefits of CDC:
In plain terms: instead of asking "any new rows?" every 5 seconds, Debezium reads the database's own internal change log — it knows about new rows the moment they're committed.
Concrete sketch: Debezium connects to your PostgreSQL instance, reads the WAL using the pgoutput plugin, and streams every INSERT/UPDATE/DELETE to a Kafka topic. Your outbox INSERT triggers immediately, Debezium publishes it to outbox.events Kafka topic within 50–100ms. Downstream consumers process it. Debezium tracks the WAL position (LSN) so on restart it picks up exactly where it left off — no events missed, no events duplicated at the Kafka-publish level (though consumers still need idempotency).
The Outbox Pattern is one of the most broadly useful patterns in distributed systems. Its main cost is operational complexity:
| Aspect | Outbox + CDC | Outbox + Polling | Dual Write (no outbox) |
|---|---|---|---|
| Delivery guarantee | At-least-once | At-least-once | None (silent losses possible) |
| Publish latency | 50–100ms | Up to poll interval | Immediate (when it works) |
| DB overhead | Minimal (WAL already exists) | Polling SELECT + UPDATE | None |
| Infrastructure | Debezium + Kafka needed | Just the relay process | Nothing extra |
| Simplicity | Complex setup | Medium | Simple (until it breaks) |
The polling variant is often the right starting point. It requires no CDC infrastructure, just a background job and an extra table. Add CDC later if you need lower latency or higher throughput.
When to use the Outbox Pattern:
When not to bother:
The dual-write bug is one of the most common data consistency issues in production microservices. It's easy to miss in code review ("we write to the DB and then publish — what could go wrong?") and shows up as subtle, intermittent data divergence between services. The Outbox Pattern is the standard mitigation. Debezium + Kafka is the production-grade implementation you'll see at large scale. A simple polling relay is the practical starting point for most teams. Either way, the underlying principle — write the event atomically with the business data, then deliver it separately — is what you want to internalize.
Next section: MESSAGING & EVENTS — Message Queues, Pub/Sub, and Event-Driven Architecture.
Dual-write problem: writing to the DB and publishing an event are two separate operations — one can fail silently.