Handling Partial Rollbacks

Written byAmosSagaweaw Creator

One of the greatest challenges in microservice orchestration is not the happy path, nor even the sad path where everything fails. The real architectural nightmare occurs in Partial Rollbacks.

Imagine the scenario:

You process a payment (Success)
You reserve inventory (Success)
You issue the invoice (Unrecoverable failure)

Sagaweaw will start the compensation flow. First, it cancels the inventory reservation. But what if cancelling the payment (refund) at the external provider fails due to a network timeout?

The Intermediate State Trap

Pure event-based choreography systems (such as loose Kafka/RabbitMQ) generally leave this system in an inconsistent intermediate state (money charged, order not delivered). To fix this, you would need to create daily reconciliation cronjobs that scour the database.

How Sagaweaw Handles This

With Sagaweaw's orchestration, compensatory steps that fail enter a RETRIABLE COMPENSATING state. The central orchestrator never "forgets" that money needs to be returned.

It will keep trying to call the .compensate() method according to the retry policy (e.g. infinite exponential backoff) until the payment gateway API returns an HTTP 200 OK, at which point the Saga transitions to COMPENSATED.

This is only possible because we store the state of each step transition atomically in the relational database, guaranteeing maximum durability even if the entire application is restarted during rollback.

Join the discussion!

Architecture is about trade-offs. What do you think about the decisions made in "Handling Partial Rollbacks"? Share your scenarios, ask questions, and debate with other engineers in the Sagaweaw community.

Comment on GitHub Discussions

The Intermediate State Trap​

How Sagaweaw Handles This​

Join the discussion!

The Intermediate State Trap

How Sagaweaw Handles This