Saga Pattern Deep Dive
Managing distributed transactions across microservices without distributed locks
Quick Summary (30 seconds):
The Saga pattern manages distributed transactions by breaking them into local transactions, each with a compensating action. Two implementations: Choreography (decentralized) and Orchestration (centralized).
The Saga pattern manages distributed transactions by breaking them into local transactions, each with a compensating action. Two implementations: Choreography (decentralized) and Orchestration (centralized).
The Problem
In a monolith, ACID transactions guarantee consistency. In microservices, you cannot have a single database transaction spanning multiple services. The Saga pattern solves this.
Core Challenge:
- Distributed transactions require all participants to be available (reduces availability)
- They hold locks for long periods (reduces scalability)
- Most modern databases don't support distributed transactions well
What is a Saga?
A saga is a sequence of local transactions where each transaction updates its local database and publishes an event or sends a message. If a local transaction fails, the saga executes compensating transactions to undo previous changes.
Example Order Saga: Step 1: Order Service -> Create Order (PENDING) Step 2: Payment Service -> Process Payment Step 3: Inventory Service -> Reserve Stock Step 4: Shipping Service -> Create Shipment If Step 3 fails: Compensating Step 2: Payment Service -> Refund Payment Compensating Step 1: Order Service -> Cancel Order
Choreography vs Orchestration
Choreography (Decentralized)
Services talk directly via events - no central coordinator
Order Service --event--> Payment Service Payment Service --event--> Inventory Service Inventory Service --event--> Shipping Service
Pros: No central point of failure, simple for basic flows, good scalability
Cons: Hard to debug, cyclic dependencies risk, no central visibility
Orchestration (Centralized)
Central coordinator service manages the entire flow
Orchestrator --calls--> Order Service Orchestrator --calls--> Payment Service Orchestrator --calls--> Inventory Service Orchestrator --calls--> Shipping Service
Pros: Central visibility and control, easier for complex flows, clear audit trail
Cons: Orchestrator bottleneck, single point of failure risk, more complexity
Implementation 1: Choreography Saga
// Order Service - Publishes ORDER_CREATED event
class OrderService {
function createOrder(request) {
order = saveOrder(request, status=PENDING);
publishEvent("ORDER_CREATED", order.id, order.amount);
return order;
}
function handlePaymentFailed(event) {
order = findOrder(event.orderId);
order.status = CANCELLED;
saveOrder(order);
}
}
// Payment Service - Listens for ORDER_CREATED
class PaymentService {
function handleOrderCreated(event) {
try {
processPayment(event.orderId, event.amount);
publishEvent("PAYMENT_SUCCEEDED", event.orderId);
} catch(error) {
publishEvent("PAYMENT_FAILED", event.orderId);
}
}
}
Implementation 2: Orchestration Saga
// Saga Orchestrator - Central coordinator
class OrderSagaOrchestrator {
function executeOrderSaga(request) {
sagaId = generateId();
try {
// Step 1
order = orderClient.createOrder(request);
// Step 2
payment = paymentClient.processPayment(order.id, request.amount);
// Step 3
inventory = inventoryClient.reserveStock(order.id, request.items);
// Step 4
shipping = shippingClient.createShipment(order.id, request.address);
markSagaComplete(sagaId);
} catch(error) {
compensate(sagaId);
}
}
function compensate(sagaId) {
// Compensate in reverse order
releaseInventory(orderId);
refundPayment(paymentId);
cancelOrder(orderId);
}
}
Comparison Table
| Aspect | Choreography | Orchestration |
|---|---|---|
| Complexity | Simple for basic flows | Better for complex workflows |
| Coupling | Loose (event-based) | Tighter (depends on orchestrator) |
| Visibility | Low - need distributed logging | High - central coordinator tracks state |
| Scalability | Excellent - fully distributed | Limited by orchestrator capacity |
| Single Point of Failure | No | Yes (orchestrator) |
| Debugging | Difficult | Easier |
| Best For | Simple, linear workflows | Complex, branching workflows |
Compensating Transactions Reference
| Transaction | Compensation |
|---|---|
| Create Order | Cancel Order |
| Process Payment | Refund Payment |
| Reserve Inventory | Release Inventory |
| Book Hotel | Cancel Booking |
| Send Email | Send Reversal Email |
Important: All operations and compensations MUST be idempotent. The same operation can be called multiple times due to retries or network issues.
Real-World Example: Travel Booking Saga
Travel Booking Flow: 1. Book Flight -> Compensation: Cancel Flight 2. Book Hotel -> Compensation: Cancel Hotel 3. Book Car -> Compensation: Cancel Car 4. Process Payment -> Compensation: Refund Payment If Car booking fails: Step 1: Refund Payment Step 2: Cancel Hotel Step 3: Cancel Flight Step 4: Send Cancellation Notice
Common Pitfalls and Solutions
| Pitfall | Solution |
|---|---|
| Non-idempotent operations | Use idempotency keys + unique constraints |
| Missing compensations | Design compensation for every step |
| Long-running sagas | Add timeout and checkpoint mechanisms |
| No visibility | Use central saga state repository |
| Compensation failure | Use dead letter queue + manual alerts |
Best Practices
- Design compensations first - before implementing the forward transaction
- Make all operations idempotent - use idempotency keys for all APIs
- Store saga state persistently - allows recovery after crashes
- Implement timeouts - don't wait forever for responses
- Monitor sagas actively - detect stuck or long-running sagas
- Test compensation paths - they will execute in production
- Keep sagas short - long sagas increase failure probability
Selection Guide
Choose Choreography when:
- The workflow is simple and linear (2-4 steps)
- Teams want autonomy
- You have good event-driven infrastructure
Choose Orchestration when:
- The workflow is complex with branching logic
- You need central visibility and audit trail
- You have multiple sagas that share steps
- You need to resume sagas after failures
Expert Insight:
"Sagas sacrifice ACID for BASE (Basically Available, Soft state, Eventual consistency). The choice between choreography and orchestration is organizational, not technical. Start with choreography for simplicity; add orchestration when you feel the pain of distributed debugging."
"Sagas sacrifice ACID for BASE (Basically Available, Soft state, Eventual consistency). The choice between choreography and orchestration is organizational, not technical. Start with choreography for simplicity; add orchestration when you feel the pain of distributed debugging."
Key Takeaways
- Sagas replace distributed transactions in microservices
- Two patterns: Choreography (decentralized) vs Orchestration (centralized)
- Every transaction needs a compensating action
- Idempotency is mandatory for all operations
- Store saga state for recovery and monitoring
- Test compensation paths as thoroughly as success paths