placement brief / Topics & Practice / brief / 08 Jun 2026

System Design: Chat Application 2026 [WhatsApp/Slack Architecture]

Candidates report chat application design in roughly 15-20% of FAANG system design rounds. Based on public preparation resources and candidate-reported...

By Aditya SharmaPublished 8 Jun 20262 sources listedSpot an error? Corrections open

6 min read last revised 8 Jun 2026

on this page§ 19

Last Updated: June 2026

Why Chat Is a Canonical System Design Problem

Candidates report chat application design in roughly 15-20% of FAANG system design rounds. Based on public preparation resources and candidate-reported interview threads, it tests real-time communication patterns, connection management, message persistence, and push notifications all at once.

Step 1: Requirements

Functional requirements:

One-on-one messaging and group chats (up to 500 members)
Real-time message delivery
Message status: sent, delivered, read
Media sharing: images, videos, documents
Online presence: online/offline/last seen
Push notifications for offline users
Message history stored and retrievable

Non-functional requirements:

Low latency: message delivery under 200ms for online users
High availability: 99.99% uptime
Scale: 500 million active users, 100 billion messages per day
Durability: no message loss

Step 2: Capacity Estimation

Active users: 500M
Daily messages: 100B
Messages per second: 100B / 86,400 = ~1.16M messages/sec
Average message size: 100 bytes (text) + metadata
Text message storage per day: 100B * 100B = 10TB/day

WebSocket connections:
  500M DAU, assume 20% online at peak = 100M concurrent connections
  Each connection: ~10KB memory on server
  100M * 10KB = 1TB memory -> needs ~1000 chat servers (1GB each)

Step 2b: API and Protocol Design

Chat uses two channels: a persistent WebSocket for real-time bidirectional messaging, and a REST API for everything that does not need a live connection (history fetch, media URLs, conversation list).

WebSocket frames (real-time):
  -> SEND     { conversation_id, content, client_msg_id }
  <- ACK      { client_msg_id, message_id, status: "sent" }
  <- MESSAGE  { conversation_id, message_id, sender_id, content, ts }
  <- RECEIPT  { message_id, status: "delivered" | "read" }
  <- PRESENCE { user_id, status: "online" | "offline", last_seen }

REST API (request/response):
  GET  /v1/conversations?cursor=...        list user's chats
  GET  /v1/conversations/{id}/messages?before={message_id}&limit=50
  POST /v1/media/upload-url                 get a pre-signed S3 URL
  POST /v1/conversations                    create a group

The client_msg_id field is essential and frequently missed. The client generates it before sending so that, on a flaky network where the client retries, the server can deduplicate: if it has already persisted a message with that client_msg_id, it returns the existing message_id instead of creating a duplicate. Without this, a single tap can produce two messages when the ACK is lost and the client retries.

Step 3: Core Components

[Mobile/Web Clients]
       |
       | WebSocket (persistent)
       |
[Chat Servers / Connection Layer]
       |
   [Message Queue (Kafka)]
       |
   [Message Service]
       |
  [Message DB (Cassandra)]
       |
  [Notification Service] ----> [APNs / FCM]
       |
  [Media Service] <---------> [Object Storage (S3)]
       |
  [Presence Service] -------> [Redis (online status)]

Step 4: WebSocket Connection Management

# Simplified WebSocket server pseudocode

class ConnectionManager:
    """
    Maps user_id to their WebSocket connection.
    In production: distributed via Redis pub/sub to handle
    users connected to different servers.
    """
    def __init__(self):
        self.connections = {}  # user_id -> websocket

    async def connect(self, user_id, websocket):
        self.connections[user_id] = websocket
        await self.notify_presence(user_id, online=True)

    async def disconnect(self, user_id):
        self.connections.pop(user_id, None)
        await self.notify_presence(user_id, online=False)

    async def send_message(self, recipient_id, message):
        ws = self.connections.get(recipient_id)
        if ws:
            await ws.send_json(message)
            return True
        return False  # user is offline

Cross-server delivery: When users A (on Server 1) and B (on Server 2) chat:

A sends message to Server 1
Server 1 writes to Kafka topic messages
Server 2 (subscribed to B's topic) receives from Kafka
Server 2 delivers to B via B's WebSocket connection

Alternatively, use Redis Pub/Sub: each server subscribes to channels for its connected users.

Step 5: Message Flow

One-on-one message flow:
1. Sender (A) sends message over WebSocket
2. Server assigns message_id (Snowflake), timestamp
3. Server writes to Kafka (async, fast)
4. Server ACKs to A: "message sent" (single tick)
5. Kafka consumer writes to Cassandra
6. Server finds B's connection server via user_server_map (Redis)
7. Server delivers to B's server via internal gRPC
8. B's server delivers to B over WebSocket
9. B ACKs: "delivered" (double tick)
10. A receives delivered receipt

Read receipt:
  B opens the message -> B sends READ event to server
  Server notifies A -> A sees double blue ticks

Step 6: Database Schema

-- Cassandra schema (wide column)

-- Messages table
CREATE TABLE messages (
    conversation_id UUID,
    message_id      TIMEUUID,          -- time-sortable UUID
    sender_id       UUID,
    content         TEXT,
    media_url       TEXT,
    message_type    TEXT,              -- text, image, video, file
    status          TEXT,              -- sent, delivered, read
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);  -- newest first

-- Conversation membership
CREATE TABLE conversation_members (
    conversation_id UUID,
    user_id         UUID,
    joined_at       TIMESTAMP,
    last_read_id    TIMEUUID,
    PRIMARY KEY (user_id, conversation_id)
);

-- User conversations index
CREATE TABLE user_conversations (
    user_id             UUID,
    conversation_id     UUID,
    last_message_time   TIMESTAMP,
    last_message        TEXT,
    unread_count        INT,
    PRIMARY KEY (user_id, last_message_time)
) WITH CLUSTERING ORDER BY (last_message_time DESC);

Why Cassandra?

Write-heavy workload (100B writes/day)
Natural (conversation_id, message_id) partition = O(1) write + O(1) range read per conversation
Linear horizontal scaling
Tunable consistency (writes with ONE, reads with QUORUM for balance)

Step 7: Presence Service

User online status:
  Login: SET user:{id}:status "online" EX 300 (Redis with 5-min TTL)
  Heartbeat: client sends ping every 30 seconds, extends TTL
  Logout: DEL user:{id}:status
  Timeout: TTL expires -> user appears offline

"Last seen" timestamp:
  On disconnect: SET user:{id}:last_seen {timestamp}

Scaling presence:
  Single Redis handles ~10M keys easily
  For 500M users: Redis Cluster with ~10 shards
  Read: O(1) GET per user

Step 8: Push Notifications for Offline Users

Message arrives for offline user B:
1. WebSocket delivery fails (B is not connected)
2. Message Service publishes to Notification Queue (Kafka)
3. Notification Worker reads event
4. Check B's notification preferences (DB lookup)
5. Route to APNs (iOS) or FCM (Android) or Web Push
6. Push "You have a new message" with payload
7. B opens app -> fetches missed messages via REST API

Rate limiting notifications:
  Batch rapid messages: send one notification for 5+ messages in 30 seconds
  Respect "Do Not Disturb" preferences in user settings

Step 9: Media Handling

Sending an image:
1. Client uploads image directly to S3 (pre-signed URL from server)
2. Client gets S3 URL
3. Client sends message with media_url = S3 URL
4. Recipients download media from S3 (CDN-cached)

Pre-signed URL flow:
  POST /media/upload-url
  -> Server generates S3 pre-signed URL (valid 15 min)
  -> Client uploads directly to S3 (bypasses app servers)
  -> Client sends message_id + s3_url
  -> No media goes through chat server (saves bandwidth)

Step 10: Scaling and Tradeoffs

Component	Scaling approach	Tradeoff
Chat servers	Horizontal, stateless (Redis for connection map)	Redis becomes hotspot at extreme scale
Message storage	Cassandra cluster, partition by conversation	Cross-conversation queries are expensive
Presence	Redis cluster, sharded by user_id	Minor stale presence under failure
Message queue	Kafka, partitioned by conversation_id	Ordering guaranteed per partition
Media	S3 + CloudFront CDN	Storage cost scales linearly

Group Chat Specifics

Group chat (up to 500 members):

Message fanout problem:
  When A sends to a 500-member group,
  the server must deliver to 500 connections.

Option 1: Push model
  Server writes message to each member's inbox.
  500 writes per message. Simple, but expensive for large groups.

Option 2: Pull model
  Server writes to single group inbox.
  Each member polls for new messages.
  Lower write amplification, higher read load.

Option 3: Hybrid (WhatsApp approach)
  Small groups (< 100): push to each member's server
  Large groups (100-500): members pull from group timeline

The Hard Problems in Chat at Scale

Message ordering is the most underestimated challenge. When two users send messages simultaneously in a group chat, the server must linearize them. The standard approach is to assign a monotonically increasing sequence number at the conversation level using a distributed counter (Redis INCR or database sequence). Clients display messages sorted by sequence number, not by local clock, because client clocks drift.

The "last seen" problem is another commonly asked follow-up. Updating last_seen for every message read would create enormous write traffic for highly active users. The production solution is to batch updates: update last_seen in Redis on every read event, and flush to the database every 30 seconds per user. This reduces database writes by roughly 30x.

Message search is a separate system entirely. Full-text search over message history requires Elasticsearch or a similar inverted-index structure. Most chat applications offer search only for recent messages (last 90 days) to bound the index size. The interview scoping question you should ask is whether search is in-scope before designing it.

Why WebSocket Over HTTP Polling

HTTP long-polling was the predecessor to WebSocket. The client makes a request, the server holds it open until a message arrives, then the client immediately makes another request. This creates roughly one HTTP connection per message, with TCP handshake overhead on each. WebSocket is a single persistent TCP connection upgraded from HTTP, reducing per-message overhead to essentially zero. For a chat application with millions of concurrent users, the difference in server load is significant.

HTTP/2 Server-Sent Events are a middle ground: unidirectional push from server to client, simpler than WebSocket but insufficient for two-way communication without a separate request channel for sending.

Failure Handling and Delivery Guarantees

Delivery guarantees are where this design earns or loses senior-level credit. The system targets at-least-once delivery with client-side deduplication, not exactly-once, because exactly-once across a network is impractical.

Sender's network drops after SEND but before ACK:
  Client retries SEND with the same client_msg_id.
  Server deduplicates on client_msg_id and returns the original
  message_id. No duplicate is persisted or delivered.

Recipient is offline when the message is written:
  Message is persisted to Cassandra regardless of delivery.
  On reconnect, the client pulls all messages after its
  last_synced message_id. Persistence is the source of truth;
  WebSocket delivery is best-effort on top of it.

Chat server crashes with live connections:
  Connections drop; clients auto-reconnect to another server
  (load balancer reassigns). The connection map in Redis is
  updated on reconnect. Undelivered messages are pulled on sync.

Kafka consumer lag during a spike:
  Writes to Cassandra fall behind but never drop, because Kafka
  buffers. Real-time delivery degrades to "pull on next sync"
  during the spike, which is acceptable.

The principle to articulate: persist first, deliver second. Because every message is durably written before delivery is attempted, no message is ever lost even if every real-time delivery path fails. The client reconciles by syncing from its last known message_id.

Follow-up Questions Interviewers Ask

How do you guarantee message ordering in a group chat? Assign a per-conversation monotonic sequence number (Redis INCR on conversation_id or a Cassandra-friendly TIMEUUID). Clients sort by sequence, never by local wall-clock, because device clocks drift. Within a Kafka partition keyed by conversation_id, ordering is preserved end to end.

How does a user with multiple devices stay in sync? Treat each device as a separate connection but share one server-side message store keyed by user. Each device tracks its own last_synced message_id and pulls the delta on connect. A read receipt from one device propagates to the others through the same RECEIPT frames.

How do you implement end-to-end encryption without breaking server features? With E2E encryption, the server stores ciphertext and cannot read content, so server-side search and rich notification previews are lost. The Signal protocol handles key exchange per conversation. The tradeoff to state: E2E gives privacy at the cost of server-side features like full-text search and content-rich push notifications.

How do you handle a group with the maximum 500 members all active at once? Use the hybrid fan-out: members on the same chat server share a single delivery, and cross-server delivery goes through Kafka or Redis pub/sub keyed by conversation_id. The 500-member cap exists precisely to bound the worst-case fan-out per message.

How do you store and retrieve "last seen" without hammering the database? Update last_seen in Redis on every read, and flush to the durable store every 30 seconds per user. This batches roughly 30x fewer database writes while keeping the displayed value fresh enough for users.

Frequently Asked Questions

How do real-time chat applications deliver messages instantly?

WebSocket connections maintain persistent bidirectional TCP connections between clients and servers. When user A sends a message to user B, the server finds B's WebSocket connection (tracked in a connection map) and pushes the message directly. If B is offline, the message is queued and delivered via push notification.

How do chat systems handle message ordering and delivery guarantees?

Messages are assigned monotonically increasing sequence numbers per conversation. The server assigns these using a Snowflake-like ID generator (timestamp + machine ID + sequence). Clients use message IDs for ordering and send ACK receipts. The server retains unacknowledged messages and retries delivery.

What database is best for chat messages?

Apache Cassandra is the industry standard for chat message storage. Its wide-column model maps naturally to (conversation_id, message_id) as partition key + clustering key, providing O(1) write and O(1) range read per conversation. WhatsApp uses a custom Erlang system; Slack uses MySQL; Discord migrated from Cassandra to ScyllaDB.

Sources and review notesreviewed 8 Jun 2026

Article-specific sources

Verification window

Page last edited 8 Jun 2026 by Aditya Sharma. A review date records an editorial edit, not a guarantee that every external fact is still current.

Evidence labels

Official notices, candidate reports, offer documents, and editorial practice questions carry different confidence levels. The visible source list lets you inspect the evidence instead of relying on a blanket verification badge.

Verification policy: /editorial-standards/. Found something incorrect? Submit a correction - we respond within 48 hours.

topic cluster

Sat this this year? Share your story, earn ₹500.

First-person experience reports help future candidates prep smarter. We pay verified contributors ₹500 via UPI per accepted story with byline.

Submit your story →

ready to practice?

Take a free timed mock test

Put what you learned into practice. Our mock tests match the 2026 pattern with timer, navigator, reveal, and score breakdown. No signup.

Start free mock test →

related guides

Topics & Practice

Share this guide

Twitter LinkedIn W WhatsApp

System Design: Chat Application 2026 [WhatsApp/Slack Architecture]

Why Chat Is a Canonical System Design Problem

Step 1: Requirements

Step 2: Capacity Estimation

Step 2b: API and Protocol Design

Step 3: Core Components

Step 4: WebSocket Connection Management

Step 5: Message Flow

Step 6: Database Schema

Step 7: Presence Service

Step 8: Push Notifications for Offline Users

Step 9: Media Handling

Step 10: Scaling and Tradeoffs

Group Chat Specifics

The Hard Problems in Chat at Scale

Why WebSocket Over HTTP Polling

Failure Handling and Delivery Guarantees

Follow-up Questions Interviewers Ask

Frequently Asked Questions

How do real-time chat applications deliver messages instantly?

How do chat systems handle message ordering and delivery guarantees?

What database is best for chat messages?

More resources in Topics & Practice

Sat this this year? Share your story, earn ₹500.

Take a free timed mock test

System Design: URL Shortener 2026 [bit.ly Architecture Deep Dive]

System Design: Rate Limiter 2026 [Full Design with Code]

System Design: TinyURL 2026 [Hash Collision, Vanity URLs, QR Codes]

System Design: Notification Service 2026 [Push, Email, SMS Architecture]

System Design: News Feed 2026 [Twitter/Instagram Architecture]

Share this guide

System Design: Chat Application 2026 [WhatsApp/Slack Architecture]

Why Chat Is a Canonical System Design Problem

Step 1: Requirements

Step 2: Capacity Estimation

Step 2b: API and Protocol Design

Step 3: Core Components

Step 4: WebSocket Connection Management

Step 5: Message Flow

Step 6: Database Schema

Step 7: Presence Service

Step 8: Push Notifications for Offline Users

Step 9: Media Handling

Step 10: Scaling and Tradeoffs

Group Chat Specifics

The Hard Problems in Chat at Scale

Why WebSocket Over HTTP Polling

Failure Handling and Delivery Guarantees

Follow-up Questions Interviewers Ask

Related Articles

Frequently Asked Questions

How do real-time chat applications deliver messages instantly?

How do chat systems handle message ordering and delivery guarantees?

What database is best for chat messages?

More resources in Topics & Practice

Sat this this year? Share your story, earn ₹500.

Take a free timed mock test

System Design: URL Shortener 2026 [bit.ly Architecture Deep Dive]

System Design: Rate Limiter 2026 [Full Design with Code]

System Design: TinyURL 2026 [Hash Collision, Vanity URLs, QR Codes]

System Design: Notification Service 2026 [Push, Email, SMS Architecture]

System Design: News Feed 2026 [Twitter/Instagram Architecture]

Share this guide