issue 117apr 27mmxxvi
est. 2017
Sun, 27 Apr 2026
vol. IX · no. 117
PapersAdda
placement intelligence, since 2017
640+ briefs · 24 campuses · by reservation
verified offers · sourced from r/developersIndia
razorpay₹65.00 LPA· iit-d · sde-1google₹54.00 LPA· iiit-h · swe-imicrosoft₹49.50 LPA· iit-b · sdeatlassian₹38.00 LPA· nit-w · sde-1amazon₹44.20 LPA· bits-p · sde-1uber₹42.00 LPA· iit-kgp · sde-1razorpay₹65.00 LPA· iit-d · sde-1google₹54.00 LPA· iiit-h · swe-imicrosoft₹49.50 LPA· iit-b · sdeatlassian₹38.00 LPA· nit-w · sde-1amazon₹44.20 LPA· bits-p · sde-1uber₹42.00 LPA· iit-kgp · sde-1

System Design: Rate Limiter 2026 [Full Design with Code]

11 min read
Uncategorized
Updated: 8 Jun 2026
Aditya Sharma
Aditya's Edit

PapersAdda 2026 Placement Cycle

By Aditya Sharma·Founder & Editor, PapersAdda

What changed in 2026 drives

Mass-recruiter offer letters are flatter for 2026 batch - the 4-5 LPA ASE band has barely budged in three years while inflation eats real wages. Premium tracks (Digital, Pro, Elite, Specialist) are still where the differential lives, and they are entirely test-driven. If you are aiming higher than the default offer, the coding round is not optional pageantry - it is the entire interview.

What I'd actually study for this

  • 01Two solid coding-round answers (1 medium-hard DSA each, with edge-case discussion) > five half-baked ones
  • 02One real project you can defend end-to-end - file paths, design decisions, and what you would change
  • 03One DBMS schema you actually built (not a textbook ER diagram), with at least 3 join-heavy queries written from memory
  • 04Three behavioural STAR stories: failure recovered, conflict handled, ownership taken

Where most candidates trip up

The single biggest mistake is treating company-specific guides as primary prep and DSA as secondary. It is the opposite. Mass recruiters use the test as a filter, but premium tracks at every IT services company use coding to allocate offer band. Spend 70% of prep time on DSA + system fundamentals, 20% on company-specific patterns, 10% on HR rehearsal. Reverse that ratio and you collect the default offer.

Editorial commentary by Aditya Sharma · written for PapersAdda · not generated, not aggregated.

Last Updated: June 2026


Why Rate Limiter is a Canonical System Design Problem

Candidates report rate limiter as appearing in roughly 20% of FAANG system design rounds and as a near-universal design question at API platform companies (Stripe, Twilio, Cloudflare). Based on public preparation resources and candidate-reported interview threads, it tests distributed state management, tradeoffs between accuracy and performance, and knowledge of Redis primitives.


Step 1: Requirements Clarification

Functional requirements:

  • Limit number of requests per user/IP/API key within a time window
  • Support multiple rate limit tiers (free: 1000/min, pro: 10K/min, enterprise: custom)
  • Requests exceeding the limit get 429 Too Many Requests with Retry-After header
  • Rate limits should be configurable without redeployment

Non-functional requirements:

  • Low latency: rate limit check must add less than 5ms to request latency
  • High availability: rate limiter must not be a single point of failure
  • Eventual consistency acceptable (minor over-counting under failure is OK)
  • Scale: 10 million requests per minute across 100+ microservices

Out of scope:

  • DDoS protection (separate layer, different tooling)
  • Application-level throttling (this is infrastructure-level)

Step 2: Capacity Estimation

Traffic: 10M requests/minute = 166,667 RPS
Rate limit checks per request: 1 Redis operation
Redis throughput: roughly 100K-1M ops/sec per node
Redis nodes needed: 2-3 nodes with replication

Storage per user entry:
  Key: "rate:{user_id}:{window}" = ~50 bytes
  Value: counter (8 bytes) + TTL
  Total: ~60 bytes per active user per window

For 1M active users: 60MB per window
  With 2 windows (sliding): ~120MB (fits in single Redis instance)

Step 3: Core Algorithms

Algorithm 1: Fixed Window Counter

Simplest approach. Count requests in fixed time windows (per minute, per hour).

import redis
import time

class FixedWindowRateLimiter:
    """
    Simple, low memory. Problem: burst at window boundary.
    A user can make 200% of limit in 2 seconds straddling window boundary.
    Time: O(1)  Space: O(users * windows)
    """
    def __init__(self, redis_client, limit, window_seconds):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds

    def is_allowed(self, user_id):
        window_id = int(time.time()) // self.window
        key = f"rate:{user_id}:{window_id}"

        count = self.redis.incr(key)
        if count == 1:
            self.redis.expire(key, self.window * 2)

        return count <= self.limit

Problem: A user makes 1000 requests at 0:59:59 (limit) and 1000 at 1:00:01. Both windows see 1000, but the user made 2000 in a 2-second period.


Algorithm 2: Sliding Window Log

Store timestamp of every request. On each new request, count how many are within the last window.

class SlidingWindowLogRateLimiter:
    """
    Accurate but high memory: O(requests) per user.
    Time: O(log n) for sorted set operations  Space: O(limit * users)
    """
    def __init__(self, redis_client, limit, window_seconds):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds

    def is_allowed(self, user_id):
        now = time.time()
        window_start = now - self.window
        key = f"rate_log:{user_id}"

        pipe = self.redis.pipeline()
        pipe.zremrangebyscore(key, 0, window_start)  # remove old entries
        pipe.zadd(key, {str(now): now})              # add current request
        pipe.zcard(key)                               # count remaining
        pipe.expire(key, self.window)
        _, _, count, _ = pipe.execute()

        return count <= self.limit

Algorithm 3: Sliding Window Counter (Recommended)

Approximate sliding window using two fixed windows and linear interpolation.

class SlidingWindowCounterRateLimiter:
    """
    Best production balance: approximate accuracy, O(1) per user.
    Uses: current window count + weighted previous window count.

    Approximation: current_count + prev_count * (1 - elapsed_fraction)
    Time: O(1)  Space: O(users)
    """
    def __init__(self, redis_client, limit, window_seconds):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds

    def is_allowed(self, user_id):
        now = time.time()
        current_window = int(now) // self.window
        prev_window = current_window - 1
        elapsed_in_window = now - (current_window * self.window)

        curr_key = f"rate:{user_id}:{current_window}"
        prev_key = f"rate:{user_id}:{prev_window}"

        pipe = self.redis.pipeline()
        pipe.get(curr_key)
        pipe.get(prev_key)
        curr_count, prev_count = pipe.execute()

        curr_count = int(curr_count or 0)
        prev_count = int(prev_count or 0)

        # Weighted approximation
        elapsed_fraction = elapsed_in_window / self.window
        approx_count = curr_count + prev_count * (1 - elapsed_fraction)

        if approx_count >= self.limit:
            return False

        pipe2 = self.redis.pipeline()
        pipe2.incr(curr_key)
        pipe2.expire(curr_key, self.window * 2)
        pipe2.execute()

        return True

Algorithm 4: Token Bucket

Fill a bucket with tokens at a fixed rate. Each request consumes one token. Allows bursting.

class TokenBucketRateLimiter:
    """
    Supports bursting. More complex Redis implementation.
    Time: O(1)  Space: O(users)
    """
    def __init__(self, redis_client, capacity, refill_rate):
        self.redis = redis_client
        self.capacity = capacity     # max tokens
        self.refill_rate = refill_rate  # tokens per second

    def is_allowed(self, user_id):
        key = f"token_bucket:{user_id}"
        now = time.time()

        # Lua script for atomicity
        lua_script = """
        local key = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local refill_rate = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])

        local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
        local tokens = tonumber(bucket[1]) or capacity
        local last_refill = tonumber(bucket[2]) or now

        -- Refill based on time elapsed
        local elapsed = now - last_refill
        tokens = math.min(capacity, tokens + elapsed * refill_rate)

        if tokens >= 1 then
            tokens = tokens - 1
            redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
            redis.call('EXPIRE', key, 3600)
            return 1
        else
            return 0
        end
        """
        result = self.redis.eval(lua_script, 1, key, self.capacity, self.refill_rate, now)
        return result == 1

Algorithm 5: Leaky Bucket

The leaky bucket models requests as water entering a bucket that leaks at a constant rate. Unlike the token bucket (which allows bursts), the leaky bucket enforces a smooth, constant output rate. Requests that arrive when the bucket is full are dropped.

class LeakyBucketRateLimiter:
    """
    Smooths traffic to a constant outflow rate. No bursts allowed.
    Implemented as a queue draining at a fixed rate.
    Time: O(1)  Space: O(users)
    """
    def __init__(self, redis_client, capacity, leak_rate):
        self.redis = redis_client
        self.capacity = capacity       # max queued requests
        self.leak_rate = leak_rate     # requests drained per second

    def is_allowed(self, user_id):
        key = f"leaky:{user_id}"
        now = time.time()

        lua_script = """
        local key = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local leak_rate = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])

        local bucket = redis.call('HMGET', key, 'level', 'last_leak')
        local level = tonumber(bucket[1]) or 0
        local last_leak = tonumber(bucket[2]) or now

        -- Leak out water since last check
        local leaked = (now - last_leak) * leak_rate
        level = math.max(0, level - leaked)

        if level + 1 <= capacity then
            level = level + 1
            redis.call('HMSET', key, 'level', level, 'last_leak', now)
            redis.call('EXPIRE', key, 3600)
            return 1
        else
            redis.call('HMSET', key, 'level', level, 'last_leak', now)
            return 0
        end
        """
        result = self.redis.eval(lua_script, 1, key, self.capacity, self.leak_rate, now)
        return result == 1

The leaky bucket is the right choice when a downstream service requires a strictly constant request rate (for example, a payment processor that mandates no more than N transactions per second). The token bucket suits the opposite case: an API that wants to tolerate short bursts as long as the long-run average stays within limit.


Step 3b: Why Atomicity Matters Here

The single most common bug in a distributed rate limiter is a read-then-write race. Consider the naive sliding-window-counter flow: read the count, compare to the limit, then increment. Between the read and the increment, another request on a different server can read the same count and also pass the check. Under high concurrency, both requests are admitted even though only one slot was free, so the effective limit is breached.

Server A: GET count -> 999 (limit 1000), check passes
Server B: GET count -> 999 (limit 1000), check passes   <-- race
Server A: INCR -> 1000
Server B: INCR -> 1001   <-- over the limit

Two correct fixes. First, push the entire check-and-increment into a single Lua script that Redis executes atomically (Redis runs Lua single-threaded, so no interleaving is possible). The token bucket and leaky bucket above already do this. Second, for the simple counter, use INCR first and then compare: INCR is atomic and returns the post-increment value, so each request gets a unique slot number and you reject any request whose returned value exceeds the limit. The order matters: increment-then-check is race-free, check-then-increment is not.


Step 4: High-Level Architecture

Client Request
     |
     v
[API Gateway / Load Balancer]
     |
     |-----> [Rate Limiter Middleware]
     |              |
     |              v
     |        [Redis Cluster]
     |        (rate counters, TTL-based)
     |
     |  if allowed:
     v
[Backend Services]

Header response:
  X-RateLimit-Limit: 1000
  X-RateLimit-Remaining: 734
  X-RateLimit-Reset: 1686234060
  Retry-After: 45 (on 429)

Step 5: API Design

Rate Limit Configuration API:

POST /rate-limits
{
  "tier": "pro",
  "resource": "/api/v1/payments",
  "window_seconds": 60,
  "limit": 10000
}

Rate Limit Header in every response:
  HTTP/1.1 200 OK
  X-RateLimit-Limit: 1000
  X-RateLimit-Remaining: 734
  X-RateLimit-Reset: 1686234060

On rate limit exceeded:
  HTTP/1.1 429 Too Many Requests
  Retry-After: 45
  Content-Type: application/json
  {"error": "rate_limit_exceeded", "message": "Retry after 45 seconds"}

Step 6: Distributed Deployment

Single Redis: works for < 100K RPS. Single point of failure.

Redis Cluster (sharded): partition user keys across nodes.
  - Key: rate:{user_id} -> shard by hash(user_id)
  - Each shard handles subset of users
  - Failure of one shard affects only that partition

Redis Sentinel: master-replica with automatic failover.
  - Write to master, read replicas for read-heavy workloads
  - Acceptable: minor over-counting during failover

Consistent hashing for user assignment:
  - Adds/removes Redis nodes without full re-sharding
  - ~k/n keys reassigned when adding node (k = total keys, n = node count)

Step 7: Tradeoffs

AlgorithmAccuracyMemoryLatencyBurst handling
Fixed WindowLow (boundary burst)O(1)O(1)No
Sliding Window LogExactO(limit * users)O(log n)No
Sliding Window Counter~99% accurateO(1)O(1)No
Token BucketExactO(1)O(1)Yes
Leaky BucketExactO(1)O(1)No (smooths)

Production recommendation: Sliding Window Counter for most APIs. Token Bucket when burst allowance is a product requirement.


Common Follow-up Questions

Q: How do you handle Redis failure? Use circuit breaker: if Redis is down, allow requests with a fallback local in-memory counter. Accept minor over-counting vs a complete outage.

Q: How do you rate limit across multiple data centers? Each DC maintains its own counters. Accept eventual consistency: a user in DC1 might temporarily exceed global limit by the amount in DC2. True global consistency requires cross-DC synchronization which adds latency. Most products accept this tradeoff.

Q: What if the same user hits different servers? All servers connect to the same Redis cluster. The rate limiter state is centralized in Redis, not in the application server.

Q: How do you keep the rate-limit check under 5ms? Co-locate the Redis cluster in the same availability zone as the API gateway so the network round trip stays under 1ms. Use a single pipelined Redis call (or one Lua script) per check rather than multiple sequential calls. Keep connection pools warm so no request pays connection-setup cost. The dominant cost is the network hop, so minimizing round trips is the whole game.

Q: How do you support different limits for different endpoints and tiers? Store the limit configuration in a config store (Redis hash or a config service) keyed by (tier, resource). The rate limiter reads the applicable limit at check time and caches it locally with a short TTL. Because limits change rarely and reads dominate, a 30-second local cache of the config eliminates a config lookup on the hot path while still picking up changes quickly.

Q: How do you prevent a thundering herd when many users reset at the same window boundary? Fixed windows reset all users simultaneously, which can spike traffic at the boundary. The sliding window counter avoids this because each user's weighting decays continuously rather than snapping to zero at a shared boundary. If you must use fixed windows, add a small random jitter to each user's window start so resets spread out over time.

Q: Where do you return the Retry-After value from? For a fixed or sliding window, Retry-After is the seconds remaining until the current window expires. For a token bucket, it is the time needed to refill one token: 1 divided by the refill rate. Returning an accurate Retry-After lets well-behaved clients back off precisely instead of hammering the endpoint.


Methodology applied to this articlelast verified 8 Jun 2026
Sources used
Public exam-pattern documents, official recruiter pages, and verified candidate reports on r/developersIndia and LinkedIn.
Verification window
Page last edited 8 Jun 2026 by Aditya Sharma. Numbers and patterns sanity-checked against the most recent 2026 cycle drives we tracked.
What we did NOT do
  • No fabricated salary numbers or success rates. If we quote a range, it's sourced.
  • No noun-substituted templates. This article was not generated by swapping company names in a stock prompt.
  • No paid placements, sponsored coaching links, or affiliate-shilled course pushes.
Verification policy: /editorial-standards/. Found something incorrect? Submit a correction - we respond within 48 hours.

Explore this topic cluster

More resources in Uncategorized

Use the category hub to browse similar questions, exam patterns, salary guides, and preparation resources related to this topic.

Paid contributor programme

Sat this this year? Share your story, earn ₹500.

First-person experience reports help future candidates prep smarter. We pay verified contributors ₹500 via UPI per accepted story - with byline.

Submit your story →

Ready to practice?

Take a free timed mock test

Put what you learned into practice. Our mock tests match the 2026 pattern with timer, navigator, reveal, and score breakdown. No signup.

Start Free Mock Test →

More from PapersAdda

Share this guide: