Why We Don't Use 2PC — And What We Actually Do Instead
Two-Phase Commit is the textbook answer to distributed transactions. It's also almost always wrong for microservices. Here's why, and what the practical alternative looks like.
What 2PC Actually Is
Two-Phase Commit (2PC) is a protocol designed to coordinate a distributed transaction across multiple participants so that either all commit or all rollback — atomically.
Phase 1 — Prepare: The coordinator sends a "can you commit?" message to every participant. Each participant locks its resources, writes to its transaction log, and replies "yes" or "no."
Phase 2 — Commit or Rollback: If all participants said yes, the coordinator sends "commit." If any said no (or failed to respond), the coordinator sends "rollback." Each participant releases its locks and applies the outcome.
In theory, this gives you atomic commits across multiple databases. In practice, it gives you a distributed system that fails in exciting new ways.
Why 2PC Fails in Microservices
It's a blocking protocol. During Phase 1, every participant holds locks on its resources. If the coordinator crashes between Phase 1 and Phase 2, those locks stay held — potentially indefinitely. Every participant is stuck waiting for a coordinator that may never come back.
The coordinator is a Single Point of Failure. In a distributed system, we work hard to eliminate SPOFs. 2PC introduces one by design. If the coordinator node fails at the wrong moment, the entire transaction is stuck.
Network partitions kill it. If a participant can't reach the coordinator during Phase 2, it has committed to saying "yes" in Phase 1 but doesn't know whether to commit or rollback. It must wait. This is the exact scenario distributed systems must handle gracefully — and 2PC handles it by blocking.
Heterogeneous systems don't implement XA. The XA standard for distributed transactions requires every participant to implement a specific interface. Your PostgreSQL database supports XA. Your third-party payment provider's REST API does not. The moment you need to coordinate a transaction that crosses an HTTP boundary, 2PC is off the table.
The Phantom Success Problem
Even if you could use 2PC everywhere, it wouldn't solve the real problem with modern microservices.
Consider this: your order service calls Stripe's API to charge a credit card. Stripe processes the charge successfully. Then your service crashes before it can write the result to its own database. You have no 2PC coordinator between you and Stripe — there is no XA session, no shared transaction log.
The money was charged. Your database doesn't know. That gap — between an external side effect and your local state — is where the real problem lives. 2PC can't reach across that boundary.
This is not an edge case. Every call to an external REST API is this problem.
What Sagaweaw Does Instead
Sagaweaw uses the Saga pattern with compensation — the practical alternative that actually works across HTTP boundaries, heterogeneous systems, and network failures.
The key insight is this: instead of one big atomic transaction, you decompose the business flow into a sequence of local transactions. Each step is an independent, locally atomic operation. If a step succeeds, you move to the next. If a step fails, you execute compensating transactions to undo the steps that already succeeded.
No locks are held across services. No coordinator blocks during a network partition. Each participant can fail and restart independently.
reserve-inventory → charge-payment → schedule-shipping
↓ ↓ ↓
release-reservation ← refund-payment ← cancel-shipment
(compensation) (compensation) (compensation)
The compensations run in reverse order. If schedule-shipping fails, you refund the payment and release the reservation. Each compensation is itself a local ACID transaction.
The Trade-off: Eventual Consistency
This is not free. Sagas are eventually consistent, not immediately consistent.
During execution, your system is in an intermediate state. Between charge-payment completing and schedule-shipping starting, there is a brief period where money was taken but no shipment was created. This is real. This is acceptable for business flows.
It's not acceptable for all systems. Financial ledgers, regulatory reporting, double-entry bookkeeping — these domains require immediate consistency and are genuinely hard to model as sagas. Know your use case before choosing.
The Pivot Step
Not all saga steps are equal. There is usually a point of no return — a step after which compensation is no longer possible in a practical sense.
In a payment flow, that step might be "funds transmitted to BACEN" (the Brazilian Central Bank). Once that happens, you cannot simply call a compensating transaction. The money is in the financial system. Recovery requires a different process entirely — a manual reversal, a reconciliation ticket, a human decision.
Sagaweaw models this explicitly. You can mark a step as a PIVOT:
.step("transmit-to-bacen")
.type(StepType.PIVOT)
.action(bacenGateway::transmit)
[!WARNING] Steps after a PIVOT are non-compensable. If they fail, Sagaweaw records the failure but cannot automatically undo the PIVOT. Your operations team must be notified.
This isn't a limitation — it's an explicit acknowledgment of business reality. Some things, once done, cannot be undone by software alone.
The Bottom Line
2PC is the right answer for a world where all participants share the same transactional infrastructure. That world doesn't describe modern microservices.
The saga pattern with compensation is the right answer for the world we actually live in: services communicating over HTTP, third-party APIs with no XA support, databases that belong to different teams, and failure modes that are the rule rather than the exception.
Junte-se ao debate!
Arquitetura é feita de trade-offs. O que você achou das decisões tomadas em "Why We Don't Use 2PC — And What We Actually Do Instead"? Compartilhe seus cenários, tire dúvidas e debata com outros engenheiros da comunidade Sagaweaw.
Comentar no GitHub Discussions