Skip to main content

The Outbox Pattern Without Kafka

Sagaweaw Team
Escrito porSagaweaw TeamPlatform Engineering

Most developers think the Outbox Pattern requires Kafka. It doesn't. A polling publisher with PostgreSQL has seconds of latency and zero extra infrastructure — which is the right trade-off for the vast majority of business sagas.

The Problem: Dual-Write

Every system that writes to a database and publishes to a message broker faces the same fundamental problem: you cannot do both atomically.

Consider the sequence:

  1. Write the order to PostgreSQL
  2. Publish OrderCreated to Kafka

If your application crashes between steps 1 and 2, the order exists in the database but the event was never published. Downstream services never hear about it. Your system is silently inconsistent.

If you reverse the order — publish first, then write — you get the mirror problem: the event was published but the database write failed. You've announced something that doesn't exist.

This is the dual-write problem. There is no way to make two independent systems (a database and a message broker) participate in the same atomic transaction without special infrastructure.

The Outbox Pattern

The solution is to convert the dual-write into a single-write. Instead of writing to both the database and the broker, you write only to the database — but you write to two tables: your domain table, and an outbox table. Both writes happen in the same local ACID transaction.

BEGIN;
INSERT INTO orders (id, status, ...) VALUES (...);
INSERT INTO sagaweaw_outbox_messages (id, topic, payload, created_at)
VALUES (...);
COMMIT;

A separate process — the relay — polls the outbox table, publishes the messages to the broker (if configured), and marks them as delivered. The relay is the only thing that touches the broker. The application never does.

This turns an impossible two-system atomicity problem into a tractable single-system durability problem: if the relay fails, the outbox record is still there. The relay picks up where it left off when it restarts.

Why People Reach for Kafka + Debezium

When teams discover the Outbox Pattern, they often reach immediately for CDC (Change Data Capture) — specifically Kafka + Debezium, which captures changes at the PostgreSQL WAL (Write-Ahead Log) level.

The appeal is real:

  • Sub-second latency: WAL events are captured as they happen, not on a polling interval
  • No polling overhead: The relay doesn't need to query the database repeatedly
  • Scalable to millions of events per second: CDC is designed for high-throughput pipelines

If you need those properties, CDC is the right choice.

Why That's Overkill for Most Sagas

Here's the thing: a saga step typically takes between 50ms and 500ms to execute. An HTTP call to an inventory service. A database write. A card charge via Stripe.

If there's a 2-second lag between the outbox write and the relay publishing the message — because the polling interval is 2 seconds — does that matter? No. The saga step that triggered the message already took 200ms. The next step won't even start until the current one finishes. You're already operating at the seconds timescale.

The math: step_duration (200ms) >> polling_interval (2000ms) is false. But total_saga_latency (30s) vs polling_interval (2s) = 6.7% — negligible.

You don't need sub-second CDC for "reserve inventory → charge payment → ship order."

What Sagaweaw Does: Polling Publisher

Sagaweaw uses a Polling Publisher — the simplest relay implementation that actually works.

A @Scheduled job polls sagaweaw_outbox_messages every N seconds (configurable), publishes messages to Kafka if Kafka is enabled, and marks them as delivered:

@Scheduled(fixedDelayString = "${sagaweaw.outbox.poll-interval:2000}")
public void pollAndRelay() {
List<OutboxMessage> pending = outboxRepository.findPending(batchSize);
for (OutboxMessage msg : pending) {
publisher.publish(msg);
outboxRepository.markDelivered(msg.getId());
}
}

No Debezium. No Kafka Connect. No separate connector cluster to operate, monitor, and upgrade. The polling publisher is a few hundred lines of Java that lives inside your application.

Kafka Is Opt-In

This is the part that surprises most developers: Kafka is optional in Sagaweaw.

sagaweaw:
kafka:
enabled: false # the default

With kafka.enabled=false, the outbox table is still written. The polling publisher still runs. Messages are marked as delivered immediately (effectively a no-op for the broker). You get the durability of the outbox pattern — the record of what happened — without needing a Kafka cluster running in your development environment or staging environment.

When you're ready to add Kafka — because you actually need async fan-out, or you're integrating with other systems — you flip the flag and configure the broker. The outbox records flow through unchanged.

This means you can adopt the outbox pattern on day one, validate your saga logic, go to production, and add Kafka later when you have a concrete reason to need it. Not because a blog post told you distributed systems require Kafka.

When CDC Is Worth It

To be direct: there are cases where you want CDC and Debezium.

  • You're processing more than 10,000 messages per second
  • You have a sub-second latency SLA between a write and its downstream effect
  • You already run Kafka in production and have the operational expertise to run Kafka Connect
  • You need to fan out to many independent consumers without the relay becoming a bottleneck

None of these typically apply to a Spring Boot service doing saga orchestration for an order management or payment system. They apply to event streaming platforms, real-time analytics, and high-throughput data pipelines.

Know your SLA before you choose your infrastructure. If your saga latency is already measured in seconds, polling with 2-second intervals is invisible.

The Honest Constraint

Polling has latency. If you need guaranteed sub-second delivery across services — not just within a saga, but as an SLA to downstream consumers — you want CDC.

For saga orchestration, that constraint rarely applies. The saga coordinator is the one sequencing steps. Downstream consumers don't drive the saga's execution path. The outbox is a record and a delivery mechanism, not a low-latency event stream.

Start with the polling publisher. Add CDC when you can measure the latency problem in production, not before.

Junte-se ao debate!

Arquitetura é feita de trade-offs. O que você achou das decisões tomadas em "The Outbox Pattern Without Kafka"? Compartilhe seus cenários, tire dúvidas e debata com outros engenheiros da comunidade Sagaweaw.

Comentar no GitHub Discussions