Long-running distributed transactions using compensating actions instead of locks.
If you are new here: The SAGA pattern is a way to manage distributed transactions without a central lock — instead of locking all resources upfront and then committing, a SAGA executes a sequence of smaller, independent steps. Each step commits locally as it succeeds. If a later step fails, the SAGA runs compensating transactions — undo operations for each prior step — in reverse order. The result: no blocking locks, no coordinator single point of failure, and transactions that can span minutes or hours rather than milliseconds. The trade-off: intermediate states are visible, and you must implement every compensating transaction yourself.
| Term | Plain meaning |
|---|---|
| SAGA | A sequence of local transactions, each with a compensating transaction to undo it |
| Compensating transaction | An application-level "undo" for a step that already committed |
| Choreography | A SAGA style where services react to events independently — no central coordinator |
| Orchestration | A SAGA style where a central orchestrator calls each step and handles failures |
| Pivotal transaction | The step after which compensation becomes impossible or impractical |
| Intermediate state | The partially-complete state visible between SAGA steps — not an error, but requires UX handling |
| Idempotency | The property that running a step or compensation multiple times gives the same result |
Order fulfillment takes 4 steps: create order, reserve inventory, charge payment, schedule shipping. Each step touches a different service. The whole thing might take 2–10 seconds from start to finish.
Solving this with 2PC means holding locks on order records, inventory rows, and payment tables for the full 2–10 seconds. Under high traffic, thousands of concurrent orders would create a massive lock contention problem — transactions queuing up behind each other, timeouts cascading.
Even setting aside performance: some steps involve external APIs. You can't hold a database lock while waiting for a payment provider to respond. The payment provider doesn't participate in your database's lock protocol. 2PC simply doesn't work across external service boundaries.
SAGA solves this by removing locks entirely. Each step commits locally and immediately when it succeeds. If a later step fails, you don't "roll back" — you run a separate compensating transaction that semantically undoes the prior committed step.
In plain terms: instead of locking everything first and asking permission, SAGA does each step as a committed action, and has a specific "undo" procedure ready for each one.
Analogy: A multi-stop international flight booking. You separately book the outbound flight (step 1: committed, seat assigned), book a connecting flight (step 2: committed), then try to book the return leg — and it's sold out (step 3: fails). You don't "lock" both flights during the process. Instead, you cancel the outbound (compensating transaction 1) and the connecting (compensating transaction 2) bookings that already succeeded. Each cancellation is a real action with a real fee or process — not a magical database rollback.
In the successful case, a SAGA executes steps one at a time. Each step:
When the final step succeeds, the SAGA is complete. The "transaction" is done — not via a 2PC commit, but simply because all steps committed locally and we know they all succeeded.
Tiny example: E-commerce checkout SAGA:
POST /orders → Order Service creates order #8812, status "pending". Returns 200.POST /inventory/reserve → Inventory Service reserves 1× item #5512 for order #8812. Returns 200.POST /payments/charge → Payments Service charges $129 to card ending 4242. Returns 200.POST /fulfillment/ship → Shipping Service schedules pickup for order #8812. Returns 200.SAGA completes. Order #8812 status changes to "confirmed."
No cross-service lock was held at any point. Each service processed its step independently.
If Step 3 (payment) fails — card declined, payment service timeout, fraud check — the SAGA must undo what has already succeeded.
The SAGA runs compensating transactions in reverse:
DELETE /inventory/reserve/{order_id} → releases the reserved inventoryPATCH /orders/{id} with status "cancelled" → marks the order cancelledThese are real API calls, real database operations. They're not database rollbacks — those steps already committed. Compensation is a new forward-moving action that semantically reverses the effect.
In plain terms: compensation is a "please undo that" message, not a database undo. Each step must have a corresponding "please undo that" operation defined up front.
What makes a good compensating transaction?
Concrete sketch: Stripe's refund API is a compensating transaction for a charge. stripe.refunds.create({charge: 'ch_xxx'}) semantically reverses the charge. It's a new operation, not a rollback — the charge history shows both the charge and the refund. This is correct SAGA compensation behavior.
In a choreography-based SAGA, there is no central coordinator. Each service responds to events from an event bus. Service A publishes "Order Created." Service B listens, does its work, publishes "Inventory Reserved." Service C listens to that, does payment, publishes "Payment Charged." And so on.
Advantages of choreography:
Disadvantages of choreography:
Analogy: A jazz improvisation. Each musician listens to the others and responds organically, no conductor needed. Beautiful when it works. Hard to debug when something goes wrong — who played the wrong note first?
Choreography works well for simple, well-understood flows with 3–4 steps. It gets hard to maintain as complexity grows.
In an orchestration-based SAGA, a dedicated Saga Orchestrator service calls each step explicitly and manages the state machine of the SAGA. It knows: "I'm in step 2, waiting for inventory confirmation. If I get a success, proceed to step 3. If I get a failure, run compensation for step 1."
Advantages of orchestration:
Disadvantages of orchestration:
Concrete sketch: Temporal.io is a popular workflow engine used for orchestration-based SAGAs. Your checkout SAGA is a Temporal Workflow — a durable function that executes steps, handles retries, and can be suspended/resumed if it crashes. Temporal persists the workflow's execution history, so if the orchestrator process crashes, the next instance picks up exactly where it left off.
This is SAGA's most important operational challenge. Between step 1 (order created) and step 3 (payment charged), the system is in a partially complete state. The order exists in the Orders database, but payment hasn't been confirmed yet.
During this window:
In plain terms: SAGA trades "invisible intermediate state behind a lock" for "visible intermediate state that requires careful UI and query design."
Managing this requires:
SAGA and 2PC solve the same problem — distributed atomicity — in fundamentally different ways:
| Property | SAGA | 2PC |
|---|---|---|
| Lock holding | No locks across services | Locks held during protocol |
| Operation duration | Minutes or hours | Must complete in seconds |
| External services | Works (no lock protocol needed) | Doesn't work (external APIs can't participate) |
| Intermediate visibility | Yes — you must handle this in UX | No — hidden behind locks |
| Failure recovery | Application-written compensations | Protocol-managed rollback |
| Throughput | High — parallel SAGAs don't contend | Lower — lock contention limits parallelism |
| Code complexity | High — every step needs a compensation | Lower — protocol handles rollback |
SAGA is the dominant pattern for distributed transactions in modern microservice architectures. Order fulfillment, subscription management, loan applications, multi-step onboarding — any workflow that touches multiple services and takes more than a second belongs in a SAGA. Before implementing, answer three questions for every step: "What is this step's compensating transaction? What happens if the compensation fails? How does the UI/API behave during intermediate states?" If you can't answer all three, you're not ready to implement the SAGA yet.
Next: Outbox Pattern — how to reliably publish events as part of a transaction without a distributed coordinator.
Happy path: SAGA executes steps sequentially — each step commits locally when it succeeds.