Microservices Interview Questions 2026 — Top 40 with Expert Answers

Senior backend engineers with microservices expertise earn ₹30-90 LPA at product companies. Staff/Principal architects at Flipkart, Swiggy, and Razorpay command ₹80 LPA-1.5 Cr. The difference between a ₹15 LPA "backend developer" and a ₹50 LPA "distributed systems engineer" is whether you can answer these questions with production depth — not just textbook definitions.

Every company running at scale — Flipkart, Swiggy, Razorpay, Amazon, Netflix, PhonePe, CRED — expects you to design, build, and debug microservices architectures. This guide covers 40 battle-tested questions compiled from real interviews at these exact companies, from fundamentals to the system design scenarios that decide Senior and Staff offers.

Related: Kubernetes Interview Questions 2026 | Docker Interview Questions 2026 | Golang Interview Questions 2026

Beginner-Level Microservices Questions (Q1–Q12)

Q1. What are microservices? How do they differ from a monolith?

Monolith: Single deployable unit containing all application features. The entire application — user management, orders, payments, notifications — is one codebase, one build, one deployment.

Microservices: Independent, small services each owning a specific business capability. Each service has its own codebase, deployment pipeline, and database.

Comparison:

Aspect	Monolith	Microservices
Deployment	Entire app per release	Independent per service
Scaling	Scale entire app	Scale individual bottlenecks
Technology	Single stack	Polyglot (right tool per job)
Team ownership	Shared codebase	Service ownership per team
Failure blast radius	Full app goes down	Isolated to one service
Development speed	Fast initially, slows with size	Slower initially, scales better
Testing	Simpler (one process)	Complex (distributed)
Operational complexity	Low	High
Data consistency	Easy (one DB transaction)	Hard (distributed transactions)

When NOT to use microservices: Small teams (<10 engineers), early-stage startups, low traffic, simple domains. A well-designed monolith is often the right choice until scale demands otherwise. Start with a modular monolith and extract services when a specific domain needs independent scaling.

Every senior backend interview starts here

Q2. What is service discovery? How does it work?

Two patterns:

Client-side discovery:

Service A queries a Service Registry (Eureka, Consul, etcd) for Service B's instances
Service A applies load balancing logic and calls one instance directly
Example: Netflix Ribbon + Eureka
Advantage: No proxy overhead
Disadvantage: Every client needs discovery library

Server-side discovery:

Service A sends request to a load balancer/proxy
Proxy queries the Service Registry and routes to an available instance
Example: AWS ALB + ECS, Kubernetes Services + kube-proxy, Nginx + Consul-template
Advantage: Language-agnostic
Disadvantage: Extra network hop, proxy is a potential bottleneck

Kubernetes service discovery: K8s uses its own service registry (etcd) + CoreDNS. Services are discoverable at <service-name>.<namespace>.svc.cluster.local. Kubernetes kube-proxy programs iptables/IPVS rules for traffic routing. This is server-side discovery built into the platform.

Asked at Flipkart, Swiggy, Amazon SDE-2/3

Q3. What is an API Gateway? What are its responsibilities?

Responsibilities:

Function	Description
Routing	Route requests to appropriate backend service
Authentication	Validate JWT tokens, API keys before forwarding
Rate limiting	Throttle per client/endpoint
SSL termination	Handle HTTPS at gateway, services use HTTP internally
Request transformation	Transform between client format and service format
Response aggregation	Call multiple services, combine into one response (BFF pattern)
Caching	Cache responses for read-heavy endpoints
Observability	Centralized logging, tracing, metrics
Circuit breaking	Fail fast when backend is unhealthy

Popular API Gateways:

Gateway	Type	Best For
Kong	Open-source, plugin-based	General purpose, high customization
AWS API Gateway	Managed, serverless	AWS-native, Lambda integration
Nginx/NGINX Plus	Reverse proxy + gateway	Performance, simple routing
Envoy	L7 proxy	Service mesh foundation (Istio, AWS App Mesh)
Traefik	Cloud-native, K8s-native	Kubernetes deployments
Apigee	Enterprise, Google Cloud	API management, developer portal

Architecture:

Mobile App
Web App       ──→  API Gateway  ──→  User Service
Third-party                     ──→  Order Service
API consumers                   ──→  Payment Service
                                ──→  Notification Service

Q4. What is the circuit breaker pattern? How does it work?

States:

CLOSED (normal operation)
    │ failures exceed threshold (e.g., 5 failures in 10 seconds)
    ▼
OPEN (circuit tripped — reject all requests immediately)
    │ after reset timeout (e.g., 30 seconds)
    ▼
HALF-OPEN (let one test request through)
    │ success → back to CLOSED
    │ failure → back to OPEN

Why it matters: Without circuit breakers, a slow/failed Payment Service causes all Order Service threads to block waiting for timeout → thread pool exhaustion → Order Service also fails → cascading failure across the system.

Implementation (Python with pybreaker):

import pybreaker

# Create circuit breaker: open after 5 failures, reset after 60s
payment_breaker = pybreaker.CircuitBreaker(fail_max=5, reset_timeout=60)

@payment_breaker
def call_payment_service(order_id: str, amount: float):
    response = requests.post(
        "http://payment-service/charge",
        json={"order_id": order_id, "amount": amount},
        timeout=2.0
    )
    response.raise_for_status()
    return response.json()

# In order service
try:
    result = call_payment_service(order_id, amount)
except pybreaker.CircuitBreakerError:
    # Circuit is OPEN — fail fast, return cached/degraded response
    return {"status": "payment_deferred", "message": "Payment system temporarily unavailable"}

Resilience4j (Java), Polly (.NET), and Istio (infrastructure level) also implement circuit breakers.

Critical pattern at Razorpay, Flipkart, and any payment/ordering system interview

Q5. What is the Saga pattern? When do you use it?

Two implementations:

1. Choreography-based Saga: Each service publishes events; other services react and execute their local transaction.

OrderService → publishes: OrderCreated
    ↓ (async)
PaymentService → listens: OrderCreated → charges card → publishes: PaymentProcessed
    ↓ (async)
InventoryService → listens: PaymentProcessed → reserves stock → publishes: StockReserved
    ↓ (async)
ShippingService → listens: StockReserved → creates shipment → publishes: ShipmentCreated

Compensating transactions on failure:

PaymentService fails → publishes: PaymentFailed
OrderService listens → cancels order → publishes: OrderCancelled

2. Orchestration-based Saga: A central Saga Orchestrator (often AWS Step Functions, Temporal, or Conductor) coordinates the workflow.

SagaOrchestrator:
1. Call PaymentService.charge() → success
2. Call InventoryService.reserve() → fails
3. Call PaymentService.refund() [compensating transaction]
4. Update order status to failed

Choreography vs. Orchestration:

	Choreography	Orchestration
Coupling	Loose (event-driven)	Tighter (orchestrator knows all steps)
Visibility	Hard to see overall flow	Clear (orchestrator owns flow)
Error handling	Complex (distributed)	Centralized in orchestrator
Best for	Simple, few services	Complex workflows, many services

Most frequently discussed pattern at system design interviews for e-commerce and fintech

Q6. What is database-per-service pattern? What are the trade-offs?

Benefits:

Services are truly independently deployable (schema changes don't break other services)
Polyglot persistence (user service → PostgreSQL, product catalog → MongoDB, sessions → Redis)
Independent scaling (scale the database with the service that needs it)
Fault isolation (payment DB failure doesn't affect user DB)

Drawbacks:

No cross-service joins (you can't JOIN users u ON o.user_id = u.id across services)
Data consistency is eventual, not immediate (no multi-DB transactions)
Duplicated data across services (denormalization)
Reporting/analytics requires aggregating from multiple sources

Solutions for cross-service data needs:

API composition: Query multiple services, join in memory (client or API Gateway)
CQRS read models: Maintain a denormalized read database updated via events
Event-driven data replication: Services publish events → other services maintain local copies

Orders Service                  Users Service
    │                               │
    └─ publishes OrderCreated       └─ publishes UserUpdated
            │                               │
            └───────────────────────────────┘
                        │
                Read Model (Elasticsearch)
                Combines order + user data for search

Asked at every microservices architecture interview

Q7. What is an event-driven architecture? How is it different from request-response?

Request-Response (synchronous):

Service A calls Service B directly, waits for response
Tight coupling (A must know B's endpoint)
If B is slow, A is slow
If B is down, A fails

Event-Driven (asynchronous):

Service A publishes an event to a message broker
Service B subscribes and processes at its own pace
A doesn't know about B (loose coupling)
A continues processing regardless of B's state

Comparison:

Aspect	Request-Response	Event-Driven
Coupling	Tight (caller knows callee)	Loose (producer doesn't know consumers)
Availability	Dependent on downstream	Independent
Latency	Synchronous wait	Asynchronous (no wait)
Complexity	Simpler to reason about	More complex (eventual consistency)
Observability	Easy (call chain visible)	Harder (event flows across brokers)
Use case	Real-time user-facing reads	Background processing, notifications, integration

Message brokers:

Apache Kafka: High-throughput, ordered, durable, replay-able. Best for event streaming, audit logs.
RabbitMQ: Feature-rich queuing, routing (AMQP), dead-letter exchanges. Best for task queues.
AWS SQS/SNS: Managed, serverless, deep AWS integration.
Apache Pulsar: Multi-tenancy, geo-replication, Kafka alternative.

Q8. What is eventual consistency? How do you handle it in microservices?

Example: User changes their email. User Service updates immediately. Order Service still has the old email for 500ms (replication lag).

CAP Theorem: In a distributed system, you can have at most 2 of: Consistency, Availability, Partition tolerance. Since network partitions happen in any distributed system, you choose CP (consistent but may be unavailable during partition — e.g., Zookeeper) or AP (available but may show stale data — e.g., Cassandra, DynamoDB).

Patterns for handling eventual consistency:

Idempotency: Handle duplicate events gracefully. Process an event with the same ID twice = same result.
Optimistic locking: Include version numbers in updates; reject stale updates.
Read-your-writes consistency: After a write, always read from the same service (not a replica) for that user's next request.
Compensating actions: If you detect inconsistency, correct it (Saga compensating transactions).
UI design: Show "processing..." for async operations instead of immediate final state.

Q9. What is the difference between REST, gRPC, and GraphQL for microservices communication?

Feature	REST	gRPC	GraphQL
Protocol	HTTP/1.1 or HTTP/2	HTTP/2 (always)	HTTP/1.1 or HTTP/2
Format	JSON/XML	Protocol Buffers (binary)	JSON
Performance	Good	Excellent (binary, multiplexed)	Good
Type safety	Optional (OpenAPI)	Strong (protobuf IDL)	Strong (schema)
Streaming	Limited (SSE, WebSocket)	Native (bidirectional)	Subscriptions
Browser support	Excellent	Limited (needs grpc-web)	Excellent
Learning curve	Low	Medium	Medium
Best for	Public APIs, external clients	Internal service-to-service	Flexible client queries

gRPC example:

// payment.proto
service PaymentService {
  rpc ProcessPayment(PaymentRequest) returns (PaymentResponse);
  rpc StreamTransactions(UserRequest) returns (stream Transaction);  // Server streaming
}

message PaymentRequest {
  string order_id = 1;
  double amount = 2;
  string currency = 3;
}

Common architecture: External clients → REST/GraphQL API (via API Gateway). Internal service-to-service → gRPC (performance, strong typing).

Asked at Razorpay, Flipkart, Zerodha architecture discussions

Q10. What is the bulkhead pattern? Why is it important?

Implementation — Thread pool isolation:

// Without bulkhead: one slow service consumes all 200 threads
// Result: entire application unresponsive

// With bulkhead: dedicated thread pools per service
@HystrixCommand(
    commandKey = "PaymentService",
    threadPoolKey = "PaymentServicePool",
    threadPoolProperties = {
        @HystrixProperty(name="coreSize", value="10"),      // Max 10 threads for payment
        @HystrixProperty(name="maxQueueSize", value="5")    // Queue 5 more
    }
)
public PaymentResult processPayment(Order order) { ... }

// RecommendationService gets its own pool — payment slowness doesn't affect it
@HystrixCommand(threadPoolKey = "RecommendationPool")
public List<Product> getRecommendations(String userId) { ... }

Implementation — Connection pool isolation: Separate database connection pools per downstream service. If one pool is exhausted (slow DB), other services still have their connections.

In Kubernetes: Resource limits + requests per deployment act as bulkheads — one misbehaving service can't consume all cluster CPU/memory.

Q11. What is the strangler fig pattern for migrating from monolith to microservices?

Migration steps:

Deploy a routing layer (API Gateway or reverse proxy) in front of the monolith
Identify a bounded context to extract (e.g., User Service)
Build the new microservice independently
Configure the router to send user-related traffic to the new service
Monolith still handles remaining domains
Repeat until monolith is replaced (or retired)

Phase 1:               Phase 2:                Phase 3:
Monolith             API Gateway               API Gateway
├── Users            ├── /users ──→ UserSvc   ├── /users    ──→ UserSvc
├── Orders           └── /*     ──→ Monolith  ├── /orders   ──→ OrderSvc
└── Payments                                  └── /payments ──→ Monolith
                                              (payments still migrating)

Key considerations:

Keep the monolith working the entire time — zero downtime migration
Start with the domain that causes the most pain (slowest deployments, most merge conflicts)
Don't try to extract tightly coupled domains first
The dual-write problem: during transition, both old and new code may need to write to data

Q12. What is a bounded context in Domain-Driven Design (DDD)?

Example — E-commerce:

Bounded Context: Order Management
├── Order (has items, status, shipping address)
├── OrderItem (product, quantity, price at purchase time)
└── Customer (just: id, name, email — what order management cares about)

Bounded Context: Product Catalog
├── Product (detailed specs, images, variants, inventory)
├── Category
└── Pricing rules

Bounded Context: Payment
├── Transaction
├── PaymentMethod
└── Customer (just: id, billing address — what payment cares about)

Note: "Customer" exists in multiple bounded contexts with different attributes. That's correct — each context only models what it needs, using its own language.

Microservice ↔ Bounded Context: Ideally, one microservice per bounded context. Splitting a bounded context across services creates tight coupling. Combining multiple bounded contexts in one service loses independence.

Context mapping: Define how bounded contexts relate — Shared Kernel, Customer/Supplier, Conformist, Anti-Corruption Layer (ACL).

DDD is the foundation of good microservices decomposition — asked at principal/architect-level interviews

Solid on Q1-Q12? You've cleared the screening bar at most companies. The intermediate section below covers the patterns that ₹25 LPA+ backend engineers are expected to know cold — saga, CQRS, event sourcing, and distributed tracing.

Essential Intermediate Microservices Questions (Q13–Q28)

Q13. What is event sourcing? How does it differ from traditional CRUD?

Traditional CRUD: Store current state. When something changes, overwrite. History is lost.

-- Current state only
UPDATE orders SET status = 'shipped', updated_at = NOW() WHERE id = 123;
-- What was the previous status? When did it change? Unknown.

Event Sourcing: Store all events that led to the current state. Current state is derived by replaying events.

# Event store (append-only)
events = [
    {"type": "OrderCreated", "order_id": 123, "items": [...], "ts": "2026-03-30T10:00"},
    {"type": "PaymentProcessed", "order_id": 123, "amount": 599, "ts": "2026-03-30T10:01"},
    {"type": "ItemShipped", "order_id": 123, "tracking": "EX123", "ts": "2026-03-30T11:30"},
]

# Derive current state by replaying events
def get_order_state(order_id):
    order = {}
    for event in get_events(order_id):
        if event["type"] == "OrderCreated":
            order = {"id": order_id, "status": "created", **event}
        elif event["type"] == "PaymentProcessed":
            order["status"] = "paid"
        elif event["type"] == "ItemShipped":
            order["status"] = "shipped"
            order["tracking"] = event["tracking"]
    return order

Benefits:

Complete audit trail (compliance, debugging)
Time travel — reconstruct state at any point in time
Events are the source of truth for other services (publish to Kafka)
Can replay events to rebuild read models or populate new services

Drawbacks:

Query complexity (you can't SELECT * WHERE status = 'paid' — need a read model)
Event schema evolution is tricky
Storage grows indefinitely (snapshots mitigate this)

Q14. What is CQRS (Command Query Responsibility Segregation)?

Command Side (writes)                Query Side (reads)
    │                                    │
Client → API → Command Handler     Client → API → Query Handler
              │                                    │
              └─→ Write Store ──→ Event ──→ Read Store
                  (PostgreSQL)   (Kafka)  (Elasticsearch / Redis)
                  normalized     async    denormalized, optimized
                  ACID           update   for specific queries

Why separate reads and writes?

Write models are optimized for correctness (normalized, ACID)
Read models are optimized for query patterns (denormalized, pre-computed)
Scale independently: most apps read 10x more than they write
Different databases per side: PostgreSQL for writes, Elasticsearch for full-text search reads

CQRS without event sourcing:

Order created (write DB: PostgreSQL)
    │
    ▼ (async, Debezium CDC or Kafka)
Order search index updated (read DB: Elasticsearch)
    │
    ▼
User searches for "my orders" — queries Elasticsearch (fast, full-text capable)

CQRS + Event Sourcing is the full pattern — events are the commands, event store is the write side, projections build the read models.

System design question at Flipkart, Amazon, and Swiggy (order search systems)

Q15. Design a notification service for a large-scale application.

Architecture:

Order Service ──→ Kafka (topic: order-events)
Payment Service ──→ Kafka (topic: payment-events)
Delivery Service ──→ Kafka (topic: delivery-events)
                      │
                      ▼
             Notification Orchestrator
             ├── Subscribe to relevant topics
             ├── Determine notification type (email, SMS, push)
             ├── Check user preferences (do-not-disturb, channels)
             ├── Deduplicate (prevent duplicate notifications)
             └── Route to appropriate channel worker
                      │
          ┌───────────┼───────────┐
          ▼           ▼           ▼
     Email Worker  SMS Worker   Push Worker
     (Amazon SES)  (Twilio/MSG91) (FCM/APNs)
          │           │           │
          └───────────┴───────────┘
                      │
               Notification Log DB
               (track sent, failed, retries)

Key design decisions:

Idempotency: Use notification ID (hash of event + user + type) to prevent duplicates
Priority queues: Separate Kafka topics/consumers for critical (OTP, payment) vs. marketing notifications
User preferences: Cache in Redis — check before sending
Retry with backoff: Exponential backoff for failed sends; dead-letter queue after N retries
Rate limiting: Don't flood users — max 3 marketing emails per day
Unsubscribe handling: Immediate propagation to prevent sending after unsubscribe

Q16. What is the outbox pattern? How does it solve the dual-write problem?

# BAD — dual-write, not atomic
def create_order(order_data):
    order = db.save_order(order_data)    # DB update succeeds
    kafka.publish("order-created", order) # What if this fails? DB has order, no event
    return order

Outbox Pattern solution: Write the event to an outbox table in the SAME database transaction. A separate process (transactional outbox publisher) reads from the outbox and publishes to Kafka.

# GOOD — single database transaction
def create_order(order_data):
    with db.transaction():
        order = db.save_order(order_data)
        # Same transaction — either both succeed or both fail
        db.save_to_outbox({
            "event_type": "OrderCreated",
            "payload": order.to_dict(),
            "status": "pending"
        })
    return order

# Separate outbox publisher process (or Debezium CDC)
def outbox_publisher():
    while True:
        events = db.get_pending_outbox_events(limit=100)
        for event in events:
            kafka.publish(event.topic, event.payload)
            db.mark_outbox_event_published(event.id)
        time.sleep(0.1)

Better implementation — Debezium (Change Data Capture): Debezium reads PostgreSQL/MySQL WAL (write-ahead log) and publishes changes to Kafka automatically. No polling process needed.

Asked at Razorpay, Flipkart, Zerodha — any system with event-driven microservices

Q17. What is the sidecar pattern? Give real examples.

Pod (Kubernetes)
├── Main Container (application)
└── Sidecar Container (e.g., Envoy proxy)
    - Same network namespace
    - Same lifecycle
    - Shared volumes

Real sidecar examples:

Sidecar	Purpose	Example
Istio Envoy Proxy	mTLS, traffic management, telemetry	All traffic in/out of pod goes through Envoy
Log shipper	Collect and forward logs	Fluent Bit reads app log file, ships to OpenSearch
Secret injector	Inject secrets at startup	Vault Agent writes secrets to shared volume
Config sync	Keep config file up-to-date	Consul Template watches Consul, regenerates nginx config
Network proxy	Ambassador pattern for external calls	Proxy to legacy system that needs auth transformation
Metrics exporter	Expose metrics for services that can't	JMX Exporter for JVM metrics → Prometheus

Init Container (runs before app, not alongside): Used for setup tasks — wait for DB to be ready, run migrations, download config files.

Q18. Explain the API Composition pattern vs. CQRS for cross-service queries.

API Composition: The API Gateway or a dedicated Composer service calls multiple services, joins the data in memory, and returns a combined response.

# API Gateway composer
async def get_order_details(order_id: str) -> OrderDetails:
    # Call multiple services concurrently
    order, user, payment, items = await asyncio.gather(
        order_service.get_order(order_id),
        user_service.get_user(order.user_id),
        payment_service.get_payment(order_id),
        inventory_service.get_items(order.item_ids)
    )
    # Compose response
    return OrderDetails(order=order, user=user, payment=payment, items=items)

Pros: Simple, always consistent (reads from authoritative sources). Cons: Higher latency (network calls per service), availability depends on all services, can't do complex joins/filtering.

CQRS Read Model: Maintain a pre-joined, denormalized view in a separate store.

OrderCreated event → update order_summary table/index
PaymentProcessed event → update payment_status in order_summary
ItemShipped event → update shipping_status in order_summary

Query: SELECT * FROM order_summary WHERE user_id = 123
-- All data pre-joined, single fast query, no service calls

Pros: Fast queries, no runtime service calls, can handle complex filtering/sorting. Cons: Eventual consistency (slight lag), complexity of maintaining projections.

When to use which:

API Composition: Real-time consistency required, few services, low query volume
CQRS Read Model: High query volume, complex filtering, acceptable eventual consistency

Q19. How do you implement retry logic with exponential backoff in microservices?

Naive retry (BAD — thundering herd):

for attempt in range(3):
    try:
        return call_payment_service()
    except Exception:
        time.sleep(1)  # All failed requests retry at exactly the same time

Exponential backoff with jitter (CORRECT):

import random
import time

def call_with_retry(func, max_attempts=4, base_delay=0.5, max_delay=32):
    for attempt in range(max_attempts):
        try:
            return func()
        except (ConnectionError, TimeoutError) as e:
            if attempt == max_attempts - 1:
                raise  # Last attempt — re-raise

            # Exponential backoff: 0.5s, 1s, 2s, 4s...
            delay = min(base_delay * (2 ** attempt), max_delay)
            # Add jitter: randomize within [delay/2, delay*1.5]
            jitter = delay * (0.5 + random.random())
            time.sleep(jitter)

        except (ValueError, KeyError) as e:
            raise  # Don't retry on non-transient errors

Tenacity library (Python):

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

@retry(
    stop=stop_after_attempt(4),
    wait=wait_exponential(multiplier=0.5, min=0.5, max=32),
    retry=retry_if_exception_type((ConnectionError, TimeoutError))
)
def call_payment_service():
    return requests.post("http://payment-service/charge", timeout=2.0)

Key principle: Only retry idempotent operations. Never auto-retry payment charges — you may double-charge. For non-idempotent operations, use idempotency keys.

Q20. What is an idempotency key? How do you implement idempotent APIs?

Client implementation:

import uuid

def process_payment(order_id: str, amount: float, retries: int = 3):
    idempotency_key = f"payment-{order_id}"  # Deterministic key per order

    for attempt in range(retries):
        response = requests.post(
            "http://payment-service/charge",
            headers={"X-Idempotency-Key": idempotency_key},
            json={"amount": amount}
        )
        if response.status_code in (200, 201):
            return response.json()
        elif response.status_code == 409:  # Conflict = already processed
            return response.json()  # Return cached result
        time.sleep(2 ** attempt)

Server implementation:

@app.post("/charge")
async def charge(request: ChargeRequest, idempotency_key: str = Header(...)):
    # Check if we've seen this key before
    cached = await redis.get(f"idempotency:{idempotency_key}")
    if cached:
        return json.loads(cached)  # Return previous result

    # Process payment
    result = await payment_processor.charge(request.amount)

    # Cache result (TTL = 24 hours)
    await redis.setex(f"idempotency:{idempotency_key}", 86400, json.dumps(result))

    return result

Razorpay, Stripe, and Paytm all implement idempotency keys for payment APIs for exactly this reason.

Q21. What is the two-phase commit (2PC) and why is it problematic in microservices?

Phase 1 (Prepare):

Coordinator asks all participants: "Can you commit?"
Each participant locks resources, writes to WAL, responds "YES" or "NO"

Phase 2 (Commit/Abort):

If all said YES → Coordinator sends COMMIT to all
If any said NO → Coordinator sends ABORT to all

Why 2PC is problematic in microservices:

Blocking protocol: Resources are locked during both phases. If coordinator crashes after Phase 1, participants are stuck with locks indefinitely.
Single point of failure: Coordinator failure blocks the entire transaction.
Network partitions: If a participant receives the Phase 2 COMMIT but another participant doesn't, you have inconsistency.
Low throughput: Long-held locks → contention → poor performance at scale.
Not cloud-native: Cloud services (AWS RDS, DynamoDB) don't participate in external 2PC.

Alternative: Saga pattern with compensating transactions — eventual consistency instead of ACID, but resilient and scalable.

Q22. How do you implement distributed tracing across microservices?

How trace context propagates:

User Request
    │ Trace-ID: abc123, Span-ID: 0001
    ▼
API Gateway
    │ (adds Span-ID: 0001 → parent-span)
    │ new Span-ID: 0002
    ▼
Order Service
    │ new Span-ID: 0003
    ▼
Payment Service
    │ (B3 headers forwarded automatically by OpenTelemetry)
    new Span-ID: 0004
    ▼
Database

OpenTelemetry context propagation:

# Service uses OTel — trace context auto-propagates via HTTP headers
from opentelemetry.propagate import inject, extract
from opentelemetry import trace

# Outgoing HTTP call — inject trace context into headers
headers = {}
inject(headers)  # Adds traceparent header
response = requests.post("http://payment-service/charge",
                          headers=headers,
                          json=payload)

# Incoming request — extract trace context
context = extract(request.headers)
with tracer.start_as_current_span("process-order", context=context):
    process_order(request.data)

Trace analysis in Jaeger:

See full call tree with each service's contribution
Identify which service caused the P99 spike
Find N+1 query patterns (many short DB calls instead of one batch)
Correlate errors across services

Q23. What is the Backends for Frontends (BFF) pattern?

Mobile App ──→ Mobile BFF ──→ User Service
               (compact JSON,  Order Service
               limited fields,  Payment Service
               offline support)

Web App ──→ Web BFF ──→ User Service
            (rich data,    Order Service
            full features)  Analytics Service

Third-party ──→ Public API ──→ User Service
                (versioned,     Order Service
                rate-limited)

Why BFF instead of one universal API Gateway?

Mobile apps need lightweight responses (battery, bandwidth)
Web apps need richer data (full user profile, analytics)
Different authentication mechanisms per client
Different rate limits
Client-specific aggregation logic doesn't pollute the universal gateway

Who implements the BFF? Typically the frontend team — they own the BFF along with the client. This gives frontend teams control over their data fetching without negotiating with a platform team.

Q24. How do you handle service-to-service authentication in microservices?

Option 1 — JWT (JSON Web Tokens):

# Service A generates JWT using its private key
token = jwt.encode({
    "sub": "order-service",
    "iss": "auth-service",
    "aud": "payment-service",
    "iat": datetime.utcnow(),
    "exp": datetime.utcnow() + timedelta(minutes=5)
}, private_key, algorithm="RS256")

# Service B verifies JWT using auth-service's public key
decoded = jwt.decode(token, public_key, algorithms=["RS256"],
                     audience="payment-service")

Option 2 — mTLS (mutual TLS) via Istio: Both services present certificates. Istio's control plane issues and rotates certs automatically — no application code changes needed. Best for Kubernetes deployments.

Option 3 — API Keys (simple, less secure): Pre-shared keys stored in secrets manager. Simpler but no identity verification, hard to rotate.

Option 4 — SPIFFE/SPIRE: Platform-agnostic workload identity. Every workload gets a SPIFFE ID (spiffe://trust-domain/service/payment) and a short-lived X.509 cert. Works across clouds, VMs, and containers.

Best practice in 2026: Istio mTLS for K8s (automatic, zero code), SPIFFE/SPIRE for hybrid environments.

Q25. What is Kafka and when should you use it over RabbitMQ?

Feature	Apache Kafka	RabbitMQ
Architecture	Distributed log (partitioned, replicated)	Message broker (push-based queuing)
Message retention	Configurable (days/weeks/forever)	Until consumed (by default)
Replay messages	Yes (consumer offsets)	No (consumed = gone)
Throughput	Millions of messages/second	Hundreds of thousands/second
Consumer model	Pull (consumer controls pace)	Push (broker delivers)
Ordering	Guaranteed within partition	Not guaranteed across queues
Use case	Event streaming, audit log, data pipeline	Task queue, RPC, work distribution

Use Kafka when:

You need to replay events (rebuild a new service's database from history)
Multiple independent consumers need the same events
High throughput (millions/second)
Event log that's the source of truth (event sourcing)
Real-time data pipelines (Kafka Streams, Flink)

Use RabbitMQ when:

Task queue with routing logic (exchange types: direct, topic, fanout, headers)
Complex retry/DLQ patterns
Request-reply patterns
Smaller scale, lower operational overhead

Q26. How do you handle versioning of microservice APIs?

Strategies:

URI versioning (most common):

GET /api/v1/users/123
GET /api/v2/users/123  # Breaking change in v2

Header versioning:

GET /api/users/123
Accept: application/vnd.myapp.v2+json

Query parameter versioning:

GET /api/users/123?version=2

Consumer-Driven Contract Testing (Pact): Before breaking changes, verify no consumer depends on the old behavior:

# Payment service defines what it expects from Order service (pact)
pact = Consumer('PaymentService').has_pact_with(Provider('OrderService'))

pact.given('Order 123 exists').upon_receiving('a request for order').with_request(
    method='GET', path='/orders/123'
).will_respond_with(200, body={"id": "123", "amount": 599})  # Contract

# Order service must fulfill this contract — tested in isolation

Event versioning (Kafka):

Add fields (backward compatible): consumers can ignore new fields
Remove/rename fields (breaking): use schema registry (Confluent Schema Registry with Avro/Protobuf) with backward/forward compatibility enforcement

Q27. What is the choreography vs. orchestration debate in microservices?

Choreography: Each service reacts to events and publishes its own events. No central coordinator.

OrderService publishes: OrderPlaced
→ PaymentService listens, charges card, publishes: PaymentCharged
→ InventoryService listens, reserves items, publishes: InventoryReserved
→ ShippingService listens, creates shipment, publishes: ShipmentCreated

Orchestration: A workflow orchestrator (Step Functions, Temporal, Conductor) explicitly calls each service in sequence.

OrderOrchestrator:
1. Call PaymentService.charge() → wait for response
2. Call InventoryService.reserve() → wait
3. Call ShippingService.create() → wait
4. Publish OrderFulfilled event

Trade-off comparison:

Aspect	Choreography	Orchestration
Coupling	Very loose	Orchestrator knows all services
Visibility	Hard to see end-to-end flow	Workflow is explicit in orchestrator
Error handling	Each service must handle its own failures	Centralized error handling, rollback logic
Debugging	Trace events across multiple topics	Debug in one place (orchestrator logs)
Testability	Hard (need full event pipeline)	Easier (mock service calls)
Evolution	Adding a step = new consumer (no code change)	Adding a step = modify orchestrator

Recommendation (2026 best practice): Use orchestration for business-critical workflows (order processing, payment flows) — visibility and error handling outweigh the coupling. Use choreography for non-critical fan-out (send email notification, update recommendation engine).

Debated at principal engineer / tech lead interviews

Q28. How do you design a rate limiter for an API gateway?

Algorithms:

Algorithm	Description	Burst Handling	Use Case
Fixed Window Counter	Count requests in fixed window (e.g., 1-minute slots)	Allows 2x limit at window boundary	Simple, common
Sliding Window Log	Track exact timestamps of last N requests	Precise	Accurate but memory-intensive
Sliding Window Counter	Combine fixed windows with interpolation	Smooth	Good balance
Token Bucket	Bucket fills at rate R, requests consume tokens	Yes (burst up to bucket size)	API rate limiting
Leaky Bucket	Requests processed at fixed rate (queue excess)	Smooths bursts	Output rate limiting

Token bucket implementation with Redis:

import redis
import time

r = redis.Redis()

def is_rate_limited(user_id: str, max_requests: int = 100, window_seconds: int = 60) -> bool:
    key = f"rate_limit:{user_id}"
    current_time = time.time()
    window_start = current_time - window_seconds

    pipe = r.pipeline()
    # Remove old entries outside window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count current requests in window
    pipe.zcard(key)
    # Add current request
    pipe.zadd(key, {str(current_time): current_time})
    # Set expiry on key
    pipe.expire(key, window_seconds)
    results = pipe.execute()

    request_count = results[1]
    return request_count >= max_requests

Distributed rate limiting: Redis with atomic Lua scripts ensures correctness across multiple API Gateway instances. Alternatively, Nginx Plus and Kong have built-in rate limiting plugins.

If you can nail Q1-Q28, you're already in the top 15% of backend candidates. The advanced section is where Staff Engineer and Architect offers are decided — real system design scenarios from Razorpay, Flipkart, and Amazon interviews.

Advanced Microservices Questions — The Architect Round (Q29–Q40)

Q29. Design an e-commerce order management system using microservices.

Architecture:

Client (Web/Mobile)
    │
API Gateway (Kong)
├── /auth → Auth Service (JWT issuance/validation)
├── /users → User Service (PostgreSQL)
├── /products → Product Catalog Service (PostgreSQL + Elasticsearch)
├── /cart → Cart Service (Redis — ephemeral)
├── /orders → Order Service (PostgreSQL + Kafka publisher)
└── /payments → Payment Service (PostgreSQL + Razorpay integration)

Kafka Topics:
- order-events: OrderCreated, OrderUpdated, OrderCancelled
- payment-events: PaymentProcessed, PaymentFailed, RefundInitiated
- inventory-events: StockReserved, StockReleased, LowStockAlert

Async Services (Kafka consumers):
- Inventory Service: listens to order-events, reserves/releases stock
- Notification Service: listens to all events, sends emails/SMS/push
- Analytics Service: listens to all events, updates dashboards
- Search Indexer: listens to order-events, updates Elasticsearch

Read Models (CQRS):
- Order History: Elasticsearch (user's past orders, full-text search)
- Recommendation Engine: Feature store (order patterns)

Order creation flow (Saga):

Order Service saves order (PENDING), publishes OrderCreated
Payment Service: processes payment, publishes PaymentProcessed or PaymentFailed
Inventory Service: reserves stock, publishes StockReserved or InsufficientStock
Shipping Service: creates shipment, publishes ShipmentCreated
Notification Service: sends order confirmation email

Compensating transactions on failure:

PaymentFailed → OrderService cancels order → InventoryService releases any reserved stock

Q30. How do you implement distributed caching in microservices?

Caching patterns:

Cache-Aside (Lazy Loading):

def get_user(user_id: str) -> User:
    cached = redis.get(f"user:{user_id}")
    if cached:
        return User.from_json(cached)

    user = db.query_user(user_id)  # Cache miss — hit DB
    redis.setex(f"user:{user_id}", 300, user.to_json())  # Cache 5 minutes
    return user

Write-Through: Write to cache and DB simultaneously. Cache always has latest.
Write-Behind (Write-Back): Write to cache first, async to DB. Risk of data loss on cache failure.
Read-Through: Cache handles all reads, fetches from DB on miss (transparent to application).

Cache invalidation strategies:

TTL (Time-To-Live): Simple, eventual consistency guaranteed
Event-driven invalidation: When user updated, publish event → cache invalidation consumer deletes key
Write-through invalidation: On write, immediately update or delete cache entry

Problems to watch:

Cache stampede: Many cache misses at once → all hit DB simultaneously. Solution: probabilistic early expiration or mutex lock on miss.
Cache poisoning: Malicious data in cache. Always validate data from cache.
Hot keys: One cache key gets millions of requests → single Redis node bottleneck. Solution: local in-process cache (Caffeine/Guava for JVM) for extremely hot keys.

Q31. How do you test microservices effectively?

Testing pyramid for microservices:

           /\
          /  \
         / E2E\       (Few — very expensive, slow, fragile)
        /──────\
       /  Integ \     (Some — verify service interactions)
      /──────────\
     /   Contract \   (Many — verify API contracts between services)
    /──────────────\
   /    Unit Tests  \ (Most — fast, isolated, plentiful)
  /──────────────────\

Unit tests: Test business logic in isolation (mock all external dependencies).

Contract tests (Pact): Consumer defines expected API behavior; provider verifies it fulfills the contract without a real integration test environment.

Integration tests: Test service against real dependencies (real DB in Docker, real Redis). Testcontainers is excellent for this:

from testcontainers.postgres import PostgresContainer

def test_order_creation():
    with PostgresContainer("postgres:16") as pg:
        db = create_engine(pg.get_connection_url())
        setup_schema(db)

        order_service = OrderService(db=db, kafka=MockKafka())
        order = order_service.create_order(user_id="u1", items=[...])
        assert order.status == "pending"

End-to-end tests: Deploy all services (docker-compose or kind cluster), run scenario tests. Run these on a schedule (not every PR) — too slow and flaky for CI gates.

Consumer-Driven Contract Testing workflow:

Consumer team writes Pact contract (what they expect)
Pact pushed to Pact Broker
Provider CI downloads and verifies against actual implementation
If provider fails contract → prevent deployment

Q32. How do you handle schema evolution in Kafka/event-driven systems?

Confluent Schema Registry with Avro/Protobuf:

# Register schema
from confluent_kafka.schema_registry import SchemaRegistryClient

sr_client = SchemaRegistryClient({"url": "http://schema-registry:8081"})

# Define Avro schema
order_schema = {
    "type": "record",
    "name": "OrderCreated",
    "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "user_id", "type": "string"},
        {"name": "amount", "type": "double"},
        {"name": "currency", "type": "string", "default": "INR"}  # New field with default
    ]
}
# Schema registry enforces compatibility before registration

Compatibility modes:

BACKWARD: New schema can read old messages (add optional fields with defaults — consumers can be updated after producers)
FORWARD: Old schema can read new messages (remove optional fields — producers can be updated before consumers)
FULL: Both backward and forward compatible
NONE: No compatibility checking (use with care)

Rules for safe evolution:

ALWAYS add fields as optional with defaults (backward compatible)
NEVER remove required fields
NEVER change field types (int → string is a breaking change)
NEVER rename fields (use aliases if needed)

Q33. Design a real-time bidding (RTB) system for online advertising using microservices.

Architecture:

Impression Request (ad slot on a website)
    │ <100ms deadline
    │
Request Router (Nginx)
    │
Bid Request Enrichment Service
├── User profile lookup (Redis — <1ms)
├── Context parsing (URL, device, geo)
└── Audience segments (feature store)
    │
Auction Service
├── Fan-out bid requests to DSPs in parallel (<50ms)
├── Collect bids (first response wins after timeout)
└── Second-price auction (winner pays second-highest + $0.01)
    │
Ad Serving Service
├── Fetch winning creative from CDN
└── Return ad markup
    │
Impression Tracker (async — fire and forget)
    │
Kafka: impression-events, click-events, conversion-events
    │
Analytics pipeline (Flink + S3 + Athena)

Latency budget: 100ms total. Typical breakdown: 20ms routing/enrichment, 50ms auction, 30ms ad serving.

Key technical challenges:

Redis clusters for sub-millisecond user profile lookups
Consistent hashing for request routing (same user → same cache node)
Goroutines/async for parallel bid collection
Circuit breakers on every DSP call (slow DSP = skip, not block)

Q34. What is the ambassador pattern? Give a real use case.

Real use case — Legacy database with connection limits:

App Pods (100 replicas × 10 connections = 1000 connections to DB)
    ↓ — this kills most databases
Problem: PostgreSQL handles ~500 connections before degrading

Solution with Ambassador (PgBouncer):
App Pod → PgBouncer (ambassador sidecar, pool of 20 connections)
PgBouncer multiplexes 100 app connections → 20 DB connections

Result: DB sees 100 replicas × 20 connections each → PgBouncer pools to DB: only 100 connections

Other ambassador use cases:

Protocol translation: App speaks REST, legacy backend requires SOAP — ambassador translates
Retry/circuit-breaking: Ambassador (Envoy) handles retries so app code doesn't need to
mTLS proxy: App speaks plain HTTP; ambassador adds mTLS for service-to-service security
StatsD → Prometheus: App emits StatsD metrics; ambassador converts to Prometheus format

Q35. How do you implement health checks in microservices? What patterns exist?

Health check types:

Shallow (ping) health check: Is the process alive? Returns 200 immediately.

@app.get("/health/ping")
async def ping():
    return {"status": "ok"}

Deep health check: Verify dependencies are accessible (DB, Redis, external services):

@app.get("/health/ready")
async def readiness_check():
    checks = {}

    # Database check
    try:
        await db.execute("SELECT 1")
        checks["database"] = "ok"
    except Exception as e:
        checks["database"] = f"failed: {str(e)}"

    # Redis check
    try:
        await redis.ping()
        checks["redis"] = "ok"
    except Exception as e:
        checks["redis"] = f"failed: {str(e)}"

    # External service check
    try:
        response = await httpx.get("http://payment-service/health/ping", timeout=1.0)
        checks["payment_service"] = "ok" if response.status_code == 200 else "degraded"
    except Exception:
        checks["payment_service"] = "unavailable"

    overall_status = "healthy" if all(v == "ok" for v in checks.values()) else "degraded"
    http_status = 200 if overall_status == "healthy" else 503

    return JSONResponse({"status": overall_status, "checks": checks}, status_code=http_status)

Liveness vs. Readiness (Kubernetes):

Liveness /health/live: Is the process healthy? (Restart if fails) — simple checks only
Readiness /health/ready: Is it ready to serve traffic? (Remove from LB if fails) — dependency checks

Important: Deep health check on liveness probe → if your DB goes down, all pods restart (worsens the situation). Only use deep checks for readiness.

Q36. Design a distributed rate limiter for a payment API serving 1M requests/minute.

Architecture:

API Gateway (multiple instances)
    │
Redis Cluster (6 nodes, 3 masters + 3 replicas)
    ├── Key: rate_limit:{user_id}:{window}
    ├── Algorithm: sliding window with sorted sets
    └── Lua script for atomic check-and-increment
    │
Fallback: Local in-memory counter (if Redis unavailable)
    └── Accept 10% of normal limit locally (graceful degradation)

Redis Lua script (atomic — prevents race conditions):

-- Atomic sliding window rate limiter
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])

local window_start = now - window

redis.call('ZREMRANGEBYSCORE', key, 0, window_start)
local count = redis.call('ZCARD', key)

if count >= limit then
    return 0  -- Rate limited
end

redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, math.ceil(window/1000))
return 1  -- Allowed

Tiered rate limits:

Free tier: 100 req/min
Business tier: 1,000 req/min
Enterprise: 10,000 req/min

Rate limit headers (RFC 6585):

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 450
X-RateLimit-Reset: 1711756800
Retry-After: 30  # Only present when rate limited

Q37. How do you achieve zero-downtime database migrations in a microservices environment?

The challenge: Service instances run multiple versions during rolling deployment. Schema changes must be backward compatible with both old and new service code simultaneously.

Expand-Contract pattern in practice:

Step 1 — Expand (add new column, keep old):

-- Migration: add new column with nullable (backward compatible)
ALTER TABLE users ADD COLUMN phone_number VARCHAR(15);

Deploy this migration before new code. Old code ignores the new column. New code writes to both old column (email) and new column (phone_number).

Step 2 — Backfill:

# Background job — doesn't block serving
def backfill_phone_numbers():
    batch_size = 1000
    offset = 0
    while True:
        users = db.query(f"SELECT id FROM users WHERE phone_number IS NULL LIMIT {batch_size} OFFSET {offset}")
        if not users:
            break
        for user in users:
            phone = lookup_phone(user.id)
            db.execute("UPDATE users SET phone_number = ? WHERE id = ?", phone, user.id)
        offset += batch_size
        time.sleep(0.1)  # Throttle

Step 3 — Deploy new code only reading new column: Verify new column has complete data. Switch reads from email to phone_number.

Step 4 — Contract (remove old column):

-- Safe to drop only after ALL old service versions are gone from production
ALTER TABLE users DROP COLUMN email;

Never combine Step 1 (migration) and Step 4 (drop) in the same deployment.

Q38. What is the Hexagonal Architecture (Ports and Adapters)?

           HTTP Adapter (REST)
                │
    Kafka ──→   │   ──→ Primary Ports (use cases)
Adapter         │               │
           CLI  │      Core Business Logic
Adapter         │      (Domain + Application Layer)
                │               │
           Secondary Ports (outbound)
               /        \
         DB Adapter    Email Adapter
         (PostgreSQL)  (SendGrid)

Primary ports: Define what the application can do (interfaces that drive the application) Secondary ports: Define what the application needs (interfaces it calls) Adapters: Implementations of port interfaces (PostgreSQL adapter implements UserRepository port)

Benefits for microservices:

Swap databases without touching business logic (test with in-memory DB)
Same business logic can be exposed as REST API AND message consumer AND CLI
Extremely testable — business logic tests use fake adapters, no real DB needed

# Domain layer (no framework dependencies)
class OrderService:
    def __init__(self, order_repo: OrderRepository, payment_gateway: PaymentGateway):
        self.order_repo = order_repo  # Port — any adapter works
        self.payment_gateway = payment_gateway

    def create_order(self, user_id: str, items: list) -> Order:
        # Pure business logic — no HTTP, no DB specifics
        order = Order(user_id=user_id, items=items)
        charged = self.payment_gateway.charge(order.total())
        if charged:
            order.confirm()
        self.order_repo.save(order)
        return order

Q39. How do you handle large file uploads in a microservices architecture?

Naive approach (broken at scale): Client uploads file to API gateway → service stores in memory → writes to S3. Problem: memory exhaustion, timeouts for large files, API gateway can't handle large bodies.

Correct pattern — Pre-signed S3 URLs:

1. Client requests upload URL
   Client → Upload Service → "I want to upload 500MB video"
                ↓
   Upload Service → AWS S3: generate pre-signed PUT URL (valid 30 min)
                ↓
   Upload Service → Client: {upload_url: "https://s3.../...", file_id: "abc"}

2. Client uploads directly to S3 (bypasses your servers entirely)
   Client → S3 (direct, signed URL)

3. S3 triggers event on completion
   S3 → EventBridge → SQS → Processing Service
               ↓
   Processing Service: validate, thumbnail, transcode

4. Client polls for processing status
   Client → Upload Service: "Is file abc ready?"
                ↓
   Upload Service: queries processing status DB

Benefits:

Files never touch your application servers
Scales infinitely (S3 handles uploads directly)
Reduces bandwidth costs (your servers don't proxy)
Works for files of any size (multi-part upload via S3 for >100MB)

Multipart upload for very large files:

# Break 5GB file into 100MB parts, upload in parallel
import boto3
from concurrent.futures import ThreadPoolExecutor

s3 = boto3.client('s3')
mpu = s3.create_multipart_upload(Bucket='my-bucket', Key='large-file.zip')

def upload_part(part_data, part_number, upload_id):
    return s3.upload_part(
        Body=part_data, Bucket='my-bucket', Key='large-file.zip',
        PartNumber=part_number, UploadId=upload_id
    )

Q40. Design a payment processing microservice that handles failures and maintains consistency.

Architecture:

Client
    │
Payment API Service (RESTful, idempotent)
├── POST /payments (idempotency-key required)
├── GET  /payments/{id} (status check)
└── POST /payments/{id}/refund
    │
    ├── Idempotency cache (Redis — 24h TTL)
    ├── Write to payments DB (PostgreSQL)
    └── Publish to outbox table (same transaction)

Outbox Processor (Debezium CDC → Kafka)
    │
Kafka: payment-commands topic
    │
Payment Processor Service (Kafka consumer)
├── Reads payment command
├── Calls Razorpay/Stripe via HTTP (with retry + circuit breaker)
├── Receives webhook confirmation (async)
└── Updates payment status in DB
    │
Kafka: payment-events topic (PaymentProcessed, PaymentFailed)
    │
Order Service ← listens (confirm order)
Notification Service ← listens (send receipt)
Analytics Service ← listens (update revenue dashboard)

Consistency guarantees:

Idempotency key: Duplicate payment requests return same result
Outbox pattern: Payment event published atomically with DB write
Saga with compensation: If payment succeeds but order confirm fails → auto-refund via compensating transaction
Exactly-once semantics: Kafka consumer uses idempotent consumer group with offset commits after DB update

Failure handling:

Razorpay API timeout → retry 3x with exponential backoff
Circuit breaker: After 10 failures in 60s, open circuit (return error immediately, don't call Razorpay)
Dead-letter queue: After all retries exhausted → move to DLQ → alert team → manual review
Webhook verification: Validate Razorpay webhook HMAC signature before processing

This is a real system design question asked at Razorpay, PhonePe, and Paytm interviews

FAQ — Honest Answers to Your Microservices Career Questions

Q: When should I start breaking a monolith into microservices? When you have clear pain points: deployment bottlenecks (one slow team blocks everyone), different scaling requirements per component, multiple teams fighting over the same codebase, or specific domains needing different technology. The threshold is roughly 50+ engineers or when deployment frequency drops because of coordination overhead.

Q: What is the difference between microservices and SOA (Service-Oriented Architecture)? SOA (2000s) used heavyweight protocols (SOAP, WS-*, XML), a central Enterprise Service Bus (ESB), and was typically implemented within one organization's IT department. Microservices use lightweight protocols (REST, gRPC, messaging), decentralized communication (no ESB), smaller service scope, and independent deployment. Microservices are essentially SOA done right.

Q: What is the minimum viable microservice size? A service should be small enough that a team of 2-4 engineers can understand the entire codebase, but large enough to be deployed and scaled independently. The "two-pizza team" rule: if a team can't be fed by two pizzas, the service is too large. Resist nano-services that make every function a separate deployment — the operational overhead isn't worth it.

Q: How do you handle the "distributed monolith" antipattern? A distributed monolith has microservices that must be deployed together (tight coupling, shared database, synchronous call chains where A calls B calls C calls D and all must be up). Fix: introduce async messaging between services, separate databases, identify bounded contexts, allow services to degrade gracefully when dependencies are unavailable.

Q: What monitoring is essential for microservices? The RED method (Rate, Error, Duration) per service. Distributed tracing with Jaeger/Tempo for debugging latency. Service dependency graph to understand call paths. Alert on SLO burn rates, not individual metrics. The four golden signals: latency, traffic, errors, saturation.

Q: Should every microservice have its own CI/CD pipeline? Yes — independent deployment is the core value proposition. Each service should have: its own repository (or module in a monorepo), its own pipeline, its own deployment lifecycle. A change to the payment service should never require coordinating with the user service deployment.

Q: What's the hardest part of microservices in practice? Data. Cross-service data access, eventual consistency, distributed transactions, and keeping read models up to date are significantly harder than equivalent monolith operations. Most teams underestimate this. The network is also unreliable in ways that in-process calls aren't — every service call needs timeouts, retries, and circuit breakers.

Q: What salary can I expect for microservices/distributed systems expertise in India? Backend Engineer (3-5 yrs, microservices): ₹20–45 LPA. Senior Backend/Architect (7+ yrs): ₹50–90 LPA. Staff/Principal with distributed systems depth: ₹80 LPA–1.5 Cr at product companies. FAANG (Amazon, Google): ₹60 LPA–2 Cr including RSUs.

You now have the same distributed systems knowledge that ₹50 LPA+ architects carry. Pair this with hands-on implementation — build a saga pattern, implement circuit breakers, deploy on Kubernetes — and you'll walk into interviews with unshakeable confidence.

Related Articles:

Kubernetes Interview Questions 2026 — where your microservices actually run
Docker Interview Questions 2026 — containerizing each service
Golang Interview Questions 2026 — Go is the top choice for microservices at Indian unicorns
React Interview Questions 2026 — the frontend consuming your APIs
System Design Interview Questions 2026

Microservices Interview Questions 2026 — Top 40 with Expert Answers

Beginner-Level Microservices Questions (Q1–Q12)

Q1. What are microservices? How do they differ from a monolith?

Q2. What is service discovery? How does it work?

Q3. What is an API Gateway? What are its responsibilities?

Q4. What is the circuit breaker pattern? How does it work?

Q5. What is the Saga pattern? When do you use it?

Q6. What is database-per-service pattern? What are the trade-offs?

Q7. What is an event-driven architecture? How is it different from request-response?

Q8. What is eventual consistency? How do you handle it in microservices?

Q9. What is the difference between REST, gRPC, and GraphQL for microservices communication?

Q10. What is the bulkhead pattern? Why is it important?

Q11. What is the strangler fig pattern for migrating from monolith to microservices?

Q12. What is a bounded context in Domain-Driven Design (DDD)?

Essential Intermediate Microservices Questions (Q13–Q28)

Q13. What is event sourcing? How does it differ from traditional CRUD?

Q14. What is CQRS (Command Query Responsibility Segregation)?

Q15. Design a notification service for a large-scale application.

Q16. What is the outbox pattern? How does it solve the dual-write problem?

Q17. What is the sidecar pattern? Give real examples.

Q18. Explain the API Composition pattern vs. CQRS for cross-service queries.

Q19. How do you implement retry logic with exponential backoff in microservices?

Q20. What is an idempotency key? How do you implement idempotent APIs?

Q21. What is the two-phase commit (2PC) and why is it problematic in microservices?

Q22. How do you implement distributed tracing across microservices?

Q23. What is the Backends for Frontends (BFF) pattern?

Q24. How do you handle service-to-service authentication in microservices?

Q25. What is Kafka and when should you use it over RabbitMQ?

Q26. How do you handle versioning of microservice APIs?

Q27. What is the choreography vs. orchestration debate in microservices?

Q28. How do you design a rate limiter for an API gateway?

Advanced Microservices Questions — The Architect Round (Q29–Q40)

Q29. Design an e-commerce order management system using microservices.

Q30. How do you implement distributed caching in microservices?

Q31. How do you test microservices effectively?

Q32. How do you handle schema evolution in Kafka/event-driven systems?

Q33. Design a real-time bidding (RTB) system for online advertising using microservices.

Q34. What is the ambassador pattern? Give a real use case.

Q35. How do you implement health checks in microservices? What patterns exist?

Q36. Design a distributed rate limiter for a payment API serving 1M requests/minute.

Q37. How do you achieve zero-downtime database migrations in a microservices environment?

Q38. What is the Hexagonal Architecture (Ports and Adapters)?

Q39. How do you handle large file uploads in a microservices architecture?

Q40. Design a payment processing microservice that handles failures and maintains consistency.

FAQ — Honest Answers to Your Microservices Career Questions

More resources in Interview Questions

Related Articles

AWS Interview Questions 2026 — Top 50 with Expert Answers

DevOps Interview Questions 2026 — Top 50 with Expert Answers

Docker Interview Questions 2026 — Top 40 with Expert Answers

Kubernetes Interview Questions 2026 — Top 50 with Expert Answers

AI/ML Interview Questions 2026 — Top 50 Questions with Answers

More from PapersAdda

Share this guide: