System Design: Chat Application 2026 [WhatsApp/Slack Architecture]

What changed in 2026 drives
Mass-recruiter offer letters are flatter for 2026 batch - the 4-5 LPA ASE band has barely budged in three years while inflation eats real wages. Premium tracks (Digital, Pro, Elite, Specialist) are still where the differential lives, and they are entirely test-driven. If you are aiming higher than the default offer, the coding round is not optional pageantry - it is the entire interview.
What I'd actually study for this
- 01Two solid coding-round answers (1 medium-hard DSA each, with edge-case discussion) > five half-baked ones
- 02One real project you can defend end-to-end - file paths, design decisions, and what you would change
- 03One DBMS schema you actually built (not a textbook ER diagram), with at least 3 join-heavy queries written from memory
- 04Three behavioural STAR stories: failure recovered, conflict handled, ownership taken
Where most candidates trip up
The single biggest mistake is treating company-specific guides as primary prep and DSA as secondary. It is the opposite. Mass recruiters use the test as a filter, but premium tracks at every IT services company use coding to allocate offer band. Spend 70% of prep time on DSA + system fundamentals, 20% on company-specific patterns, 10% on HR rehearsal. Reverse that ratio and you collect the default offer.
Editorial commentary by Aditya Sharma · written for PapersAdda · not generated, not aggregated.
Last Updated: June 2026
Why Chat Is a Canonical System Design Problem
Candidates report chat application design in roughly 15-20% of FAANG system design rounds. Based on public preparation resources and candidate-reported interview threads, it tests real-time communication patterns, connection management, message persistence, and push notifications all at once.
Step 1: Requirements
Functional requirements:
- One-on-one messaging and group chats (up to 500 members)
- Real-time message delivery
- Message status: sent, delivered, read
- Media sharing: images, videos, documents
- Online presence: online/offline/last seen
- Push notifications for offline users
- Message history stored and retrievable
Non-functional requirements:
- Low latency: message delivery under 200ms for online users
- High availability: 99.99% uptime
- Scale: 500 million active users, 100 billion messages per day
- Durability: no message loss
Step 2: Capacity Estimation
Active users: 500M
Daily messages: 100B
Messages per second: 100B / 86,400 = ~1.16M messages/sec
Average message size: 100 bytes (text) + metadata
Text message storage per day: 100B * 100B = 10TB/day
WebSocket connections:
500M DAU, assume 20% online at peak = 100M concurrent connections
Each connection: ~10KB memory on server
100M * 10KB = 1TB memory -> needs ~1000 chat servers (1GB each)
Step 2b: API and Protocol Design
Chat uses two channels: a persistent WebSocket for real-time bidirectional messaging, and a REST API for everything that does not need a live connection (history fetch, media URLs, conversation list).
WebSocket frames (real-time):
-> SEND { conversation_id, content, client_msg_id }
<- ACK { client_msg_id, message_id, status: "sent" }
<- MESSAGE { conversation_id, message_id, sender_id, content, ts }
<- RECEIPT { message_id, status: "delivered" | "read" }
<- PRESENCE { user_id, status: "online" | "offline", last_seen }
REST API (request/response):
GET /v1/conversations?cursor=... list user's chats
GET /v1/conversations/{id}/messages?before={message_id}&limit=50
POST /v1/media/upload-url get a pre-signed S3 URL
POST /v1/conversations create a group
The client_msg_id field is essential and frequently missed. The client generates it before sending so that, on a flaky network where the client retries, the server can deduplicate: if it has already persisted a message with that client_msg_id, it returns the existing message_id instead of creating a duplicate. Without this, a single tap can produce two messages when the ACK is lost and the client retries.
Step 3: Core Components
[Mobile/Web Clients]
|
| WebSocket (persistent)
|
[Chat Servers / Connection Layer]
|
[Message Queue (Kafka)]
|
[Message Service]
|
[Message DB (Cassandra)]
|
[Notification Service] ----> [APNs / FCM]
|
[Media Service] <---------> [Object Storage (S3)]
|
[Presence Service] -------> [Redis (online status)]
Step 4: WebSocket Connection Management
# Simplified WebSocket server pseudocode
class ConnectionManager:
"""
Maps user_id to their WebSocket connection.
In production: distributed via Redis pub/sub to handle
users connected to different servers.
"""
def __init__(self):
self.connections = {} # user_id -> websocket
async def connect(self, user_id, websocket):
self.connections[user_id] = websocket
await self.notify_presence(user_id, online=True)
async def disconnect(self, user_id):
self.connections.pop(user_id, None)
await self.notify_presence(user_id, online=False)
async def send_message(self, recipient_id, message):
ws = self.connections.get(recipient_id)
if ws:
await ws.send_json(message)
return True
return False # user is offline
Cross-server delivery: When users A (on Server 1) and B (on Server 2) chat:
- A sends message to Server 1
- Server 1 writes to Kafka topic
messages - Server 2 (subscribed to B's topic) receives from Kafka
- Server 2 delivers to B via B's WebSocket connection
Alternatively, use Redis Pub/Sub: each server subscribes to channels for its connected users.
Step 5: Message Flow
One-on-one message flow:
1. Sender (A) sends message over WebSocket
2. Server assigns message_id (Snowflake), timestamp
3. Server writes to Kafka (async, fast)
4. Server ACKs to A: "message sent" (single tick)
5. Kafka consumer writes to Cassandra
6. Server finds B's connection server via user_server_map (Redis)
7. Server delivers to B's server via internal gRPC
8. B's server delivers to B over WebSocket
9. B ACKs: "delivered" (double tick)
10. A receives delivered receipt
Read receipt:
B opens the message -> B sends READ event to server
Server notifies A -> A sees double blue ticks
Step 6: Database Schema
-- Cassandra schema (wide column)
-- Messages table
CREATE TABLE messages (
conversation_id UUID,
message_id TIMEUUID, -- time-sortable UUID
sender_id UUID,
content TEXT,
media_url TEXT,
message_type TEXT, -- text, image, video, file
status TEXT, -- sent, delivered, read
PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC); -- newest first
-- Conversation membership
CREATE TABLE conversation_members (
conversation_id UUID,
user_id UUID,
joined_at TIMESTAMP,
last_read_id TIMEUUID,
PRIMARY KEY (user_id, conversation_id)
);
-- User conversations index
CREATE TABLE user_conversations (
user_id UUID,
conversation_id UUID,
last_message_time TIMESTAMP,
last_message TEXT,
unread_count INT,
PRIMARY KEY (user_id, last_message_time)
) WITH CLUSTERING ORDER BY (last_message_time DESC);
Why Cassandra?
- Write-heavy workload (100B writes/day)
- Natural (conversation_id, message_id) partition = O(1) write + O(1) range read per conversation
- Linear horizontal scaling
- Tunable consistency (writes with ONE, reads with QUORUM for balance)
Step 7: Presence Service
User online status:
Login: SET user:{id}:status "online" EX 300 (Redis with 5-min TTL)
Heartbeat: client sends ping every 30 seconds, extends TTL
Logout: DEL user:{id}:status
Timeout: TTL expires -> user appears offline
"Last seen" timestamp:
On disconnect: SET user:{id}:last_seen {timestamp}
Scaling presence:
Single Redis handles ~10M keys easily
For 500M users: Redis Cluster with ~10 shards
Read: O(1) GET per user
Step 8: Push Notifications for Offline Users
Message arrives for offline user B:
1. WebSocket delivery fails (B is not connected)
2. Message Service publishes to Notification Queue (Kafka)
3. Notification Worker reads event
4. Check B's notification preferences (DB lookup)
5. Route to APNs (iOS) or FCM (Android) or Web Push
6. Push "You have a new message" with payload
7. B opens app -> fetches missed messages via REST API
Rate limiting notifications:
Batch rapid messages: send one notification for 5+ messages in 30 seconds
Respect "Do Not Disturb" preferences in user settings
Step 9: Media Handling
Sending an image:
1. Client uploads image directly to S3 (pre-signed URL from server)
2. Client gets S3 URL
3. Client sends message with media_url = S3 URL
4. Recipients download media from S3 (CDN-cached)
Pre-signed URL flow:
POST /media/upload-url
-> Server generates S3 pre-signed URL (valid 15 min)
-> Client uploads directly to S3 (bypasses app servers)
-> Client sends message_id + s3_url
-> No media goes through chat server (saves bandwidth)
Step 10: Scaling and Tradeoffs
| Component | Scaling approach | Tradeoff |
|---|---|---|
| Chat servers | Horizontal, stateless (Redis for connection map) | Redis becomes hotspot at extreme scale |
| Message storage | Cassandra cluster, partition by conversation | Cross-conversation queries are expensive |
| Presence | Redis cluster, sharded by user_id | Minor stale presence under failure |
| Message queue | Kafka, partitioned by conversation_id | Ordering guaranteed per partition |
| Media | S3 + CloudFront CDN | Storage cost scales linearly |
Group Chat Specifics
Group chat (up to 500 members):
Message fanout problem:
When A sends to a 500-member group,
the server must deliver to 500 connections.
Option 1: Push model
Server writes message to each member's inbox.
500 writes per message. Simple, but expensive for large groups.
Option 2: Pull model
Server writes to single group inbox.
Each member polls for new messages.
Lower write amplification, higher read load.
Option 3: Hybrid (WhatsApp approach)
Small groups (< 100): push to each member's server
Large groups (100-500): members pull from group timeline
The Hard Problems in Chat at Scale
Message ordering is the most underestimated challenge. When two users send messages simultaneously in a group chat, the server must linearize them. The standard approach is to assign a monotonically increasing sequence number at the conversation level using a distributed counter (Redis INCR or database sequence). Clients display messages sorted by sequence number, not by local clock, because client clocks drift.
The "last seen" problem is another commonly asked follow-up. Updating last_seen for every message read would create enormous write traffic for highly active users. The production solution is to batch updates: update last_seen in Redis on every read event, and flush to the database every 30 seconds per user. This reduces database writes by roughly 30x.
Message search is a separate system entirely. Full-text search over message history requires Elasticsearch or a similar inverted-index structure. Most chat applications offer search only for recent messages (last 90 days) to bound the index size. The interview scoping question you should ask is whether search is in-scope before designing it.
Why WebSocket Over HTTP Polling
HTTP long-polling was the predecessor to WebSocket. The client makes a request, the server holds it open until a message arrives, then the client immediately makes another request. This creates roughly one HTTP connection per message, with TCP handshake overhead on each. WebSocket is a single persistent TCP connection upgraded from HTTP, reducing per-message overhead to essentially zero. For a chat application with millions of concurrent users, the difference in server load is significant.
HTTP/2 Server-Sent Events are a middle ground: unidirectional push from server to client, simpler than WebSocket but insufficient for two-way communication without a separate request channel for sending.
Failure Handling and Delivery Guarantees
Delivery guarantees are where this design earns or loses senior-level credit. The system targets at-least-once delivery with client-side deduplication, not exactly-once, because exactly-once across a network is impractical.
Sender's network drops after SEND but before ACK:
Client retries SEND with the same client_msg_id.
Server deduplicates on client_msg_id and returns the original
message_id. No duplicate is persisted or delivered.
Recipient is offline when the message is written:
Message is persisted to Cassandra regardless of delivery.
On reconnect, the client pulls all messages after its
last_synced message_id. Persistence is the source of truth;
WebSocket delivery is best-effort on top of it.
Chat server crashes with live connections:
Connections drop; clients auto-reconnect to another server
(load balancer reassigns). The connection map in Redis is
updated on reconnect. Undelivered messages are pulled on sync.
Kafka consumer lag during a spike:
Writes to Cassandra fall behind but never drop, because Kafka
buffers. Real-time delivery degrades to "pull on next sync"
during the spike, which is acceptable.
The principle to articulate: persist first, deliver second. Because every message is durably written before delivery is attempted, no message is ever lost even if every real-time delivery path fails. The client reconciles by syncing from its last known message_id.
Follow-up Questions Interviewers Ask
How do you guarantee message ordering in a group chat? Assign a per-conversation monotonic sequence number (Redis INCR on conversation_id or a Cassandra-friendly TIMEUUID). Clients sort by sequence, never by local wall-clock, because device clocks drift. Within a Kafka partition keyed by conversation_id, ordering is preserved end to end.
How does a user with multiple devices stay in sync? Treat each device as a separate connection but share one server-side message store keyed by user. Each device tracks its own last_synced message_id and pulls the delta on connect. A read receipt from one device propagates to the others through the same RECEIPT frames.
How do you implement end-to-end encryption without breaking server features? With E2E encryption, the server stores ciphertext and cannot read content, so server-side search and rich notification previews are lost. The Signal protocol handles key exchange per conversation. The tradeoff to state: E2E gives privacy at the cost of server-side features like full-text search and content-rich push notifications.
How do you handle a group with the maximum 500 members all active at once? Use the hybrid fan-out: members on the same chat server share a single delivery, and cross-server delivery goes through Kafka or Redis pub/sub keyed by conversation_id. The 500-member cap exists precisely to bound the worst-case fan-out per message.
How do you store and retrieve "last seen" without hammering the database? Update last_seen in Redis on every read, and flush to the durable store every 30 seconds per user. This batches roughly 30x fewer database writes while keeping the displayed value fresh enough for users.
Related Articles
Methodology applied to this articlelast verified 8 Jun 2026
- No fabricated salary numbers or success rates. If we quote a range, it's sourced.
- No noun-substituted templates. This article was not generated by swapping company names in a stock prompt.
- No paid placements, sponsored coaching links, or affiliate-shilled course pushes.
Explore this topic cluster
More resources in Government Exams
Use the category hub to browse similar questions, exam patterns, salary guides, and preparation resources related to this topic.
Paid contributor programme
Sat this this year? Share your story, earn ₹500.
First-person experience reports help future candidates prep smarter. We pay verified contributors ₹500 via UPI per accepted story - with byline.
Submit your story →Ready to practice?
Take a free timed mock test
Put what you learned into practice. Our mock tests match the 2026 pattern with timer, navigator, reveal, and score breakdown. No signup.
Start Free Mock Test →More from PapersAdda
IBM Internship 2026: Eligibility, Application, Stipend ₹40-55K, 6-Month Conversion Path
Kubernetes Architecture Interview Questions 2026, 30 Q&A on Control Plane and Components
Microsoft Interview Pattern Bank 2026: LRU Cache, OneDrive & AA Round
TCS NQT 2026 Application Form: Step-by-Step