Saga Pattern

Saga Pattern

Saga Pattern - Distributed Transactions Guide

Saga Pattern Deep Dive

Managing distributed transactions across microservices without distributed locks

Quick Summary (30 seconds):
The Saga pattern manages distributed transactions by breaking them into local transactions, each with a compensating action. Two implementations: Choreography (decentralized) and Orchestration (centralized).

The Problem

In a monolith, ACID transactions guarantee consistency. In microservices, you cannot have a single database transaction spanning multiple services. The Saga pattern solves this.

Core Challenge:
  • Distributed transactions require all participants to be available (reduces availability)
  • They hold locks for long periods (reduces scalability)
  • Most modern databases don't support distributed transactions well

What is a Saga?

A saga is a sequence of local transactions where each transaction updates its local database and publishes an event or sends a message. If a local transaction fails, the saga executes compensating transactions to undo previous changes.

Example Order Saga:

Step 1: Order Service -> Create Order (PENDING)
Step 2: Payment Service -> Process Payment
Step 3: Inventory Service -> Reserve Stock
Step 4: Shipping Service -> Create Shipment

If Step 3 fails:
  Compensating Step 2: Payment Service -> Refund Payment
  Compensating Step 1: Order Service -> Cancel Order

Choreography vs Orchestration

Choreography (Decentralized)

Services talk directly via events - no central coordinator

Order Service --event--> Payment Service
Payment Service --event--> Inventory Service
Inventory Service --event--> Shipping Service
Pros: No central point of failure, simple for basic flows, good scalability
Cons: Hard to debug, cyclic dependencies risk, no central visibility

Orchestration (Centralized)

Central coordinator service manages the entire flow

Orchestrator --calls--> Order Service
Orchestrator --calls--> Payment Service
Orchestrator --calls--> Inventory Service
Orchestrator --calls--> Shipping Service
Pros: Central visibility and control, easier for complex flows, clear audit trail
Cons: Orchestrator bottleneck, single point of failure risk, more complexity

Implementation 1: Choreography Saga

// Order Service - Publishes ORDER_CREATED event
class OrderService {
    function createOrder(request) {
        order = saveOrder(request, status=PENDING);
        publishEvent("ORDER_CREATED", order.id, order.amount);
        return order;
    }
    
    function handlePaymentFailed(event) {
        order = findOrder(event.orderId);
        order.status = CANCELLED;
        saveOrder(order);
    }
}

// Payment Service - Listens for ORDER_CREATED
class PaymentService {
    function handleOrderCreated(event) {
        try {
            processPayment(event.orderId, event.amount);
            publishEvent("PAYMENT_SUCCEEDED", event.orderId);
        } catch(error) {
            publishEvent("PAYMENT_FAILED", event.orderId);
        }
    }
}

Implementation 2: Orchestration Saga

// Saga Orchestrator - Central coordinator
class OrderSagaOrchestrator {
    function executeOrderSaga(request) {
        sagaId = generateId();
        
        try {
            // Step 1
            order = orderClient.createOrder(request);
            
            // Step 2
            payment = paymentClient.processPayment(order.id, request.amount);
            
            // Step 3
            inventory = inventoryClient.reserveStock(order.id, request.items);
            
            // Step 4
            shipping = shippingClient.createShipment(order.id, request.address);
            
            markSagaComplete(sagaId);
            
        } catch(error) {
            compensate(sagaId);
        }
    }
    
    function compensate(sagaId) {
        // Compensate in reverse order
        releaseInventory(orderId);
        refundPayment(paymentId);
        cancelOrder(orderId);
    }
}

Comparison Table

Aspect Choreography Orchestration
Complexity Simple for basic flows Better for complex workflows
Coupling Loose (event-based) Tighter (depends on orchestrator)
Visibility Low - need distributed logging High - central coordinator tracks state
Scalability Excellent - fully distributed Limited by orchestrator capacity
Single Point of Failure No Yes (orchestrator)
Debugging Difficult Easier
Best For Simple, linear workflows Complex, branching workflows

Compensating Transactions Reference

Transaction Compensation
Create OrderCancel Order
Process PaymentRefund Payment
Reserve InventoryRelease Inventory
Book HotelCancel Booking
Send EmailSend Reversal Email
Important: All operations and compensations MUST be idempotent. The same operation can be called multiple times due to retries or network issues.

Real-World Example: Travel Booking Saga

Travel Booking Flow:

1. Book Flight -> Compensation: Cancel Flight
2. Book Hotel -> Compensation: Cancel Hotel
3. Book Car -> Compensation: Cancel Car
4. Process Payment -> Compensation: Refund Payment

If Car booking fails:
  Step 1: Refund Payment
  Step 2: Cancel Hotel
  Step 3: Cancel Flight
  Step 4: Send Cancellation Notice

Common Pitfalls and Solutions

PitfallSolution
Non-idempotent operationsUse idempotency keys + unique constraints
Missing compensationsDesign compensation for every step
Long-running sagasAdd timeout and checkpoint mechanisms
No visibilityUse central saga state repository
Compensation failureUse dead letter queue + manual alerts

Best Practices

  • Design compensations first - before implementing the forward transaction
  • Make all operations idempotent - use idempotency keys for all APIs
  • Store saga state persistently - allows recovery after crashes
  • Implement timeouts - don't wait forever for responses
  • Monitor sagas actively - detect stuck or long-running sagas
  • Test compensation paths - they will execute in production
  • Keep sagas short - long sagas increase failure probability

Selection Guide

Choose Choreography when:
  • The workflow is simple and linear (2-4 steps)
  • Teams want autonomy
  • You have good event-driven infrastructure
Choose Orchestration when:
  • The workflow is complex with branching logic
  • You need central visibility and audit trail
  • You have multiple sagas that share steps
  • You need to resume sagas after failures
Expert Insight:
"Sagas sacrifice ACID for BASE (Basically Available, Soft state, Eventual consistency). The choice between choreography and orchestration is organizational, not technical. Start with choreography for simplicity; add orchestration when you feel the pain of distributed debugging."

Key Takeaways

  • Sagas replace distributed transactions in microservices
  • Two patterns: Choreography (decentralized) vs Orchestration (centralized)
  • Every transaction needs a compensating action
  • Idempotency is mandatory for all operations
  • Store saga state for recovery and monitoring
  • Test compensation paths as thoroughly as success paths