placement brief / Topics & Practice / brief / 08 Jun 2026

System Design: Rate Limiter 2026 [Full Design with Code]

Candidates report rate limiter as appearing in roughly 20% of FAANG system design rounds and as a near-universal design question at API platform companies...

By Aditya SharmaPublished 8 Jun 20262 sources listedSpot an error? Corrections open

6 min read last revised 8 Jun 2026

on this page§ 12

Last Updated: June 2026

Why Rate Limiter is a Canonical System Design Problem

Candidates report rate limiter as appearing in roughly 20% of FAANG system design rounds and as a near-universal design question at API platform companies (Stripe, Twilio, Cloudflare). Based on public preparation resources and candidate-reported interview threads, it tests distributed state management, tradeoffs between accuracy and performance, and knowledge of Redis primitives.

Step 1: Requirements Clarification

Functional requirements:

Limit number of requests per user/IP/API key within a time window
Support multiple rate limit tiers (free: 1000/min, pro: 10K/min, enterprise: custom)
Requests exceeding the limit get 429 Too Many Requests with Retry-After header
Rate limits should be configurable without redeployment

Non-functional requirements:

Low latency: rate limit check must add less than 5ms to request latency
High availability: rate limiter must not be a single point of failure
Eventual consistency acceptable (minor over-counting under failure is OK)
Scale: 10 million requests per minute across 100+ microservices

Out of scope:

DDoS protection (separate layer, different tooling)
Application-level throttling (this is infrastructure-level)

Step 2: Capacity Estimation

Traffic: 10M requests/minute = 166,667 RPS
Rate limit checks per request: 1 Redis operation
Redis throughput: roughly 100K-1M ops/sec per node
Redis nodes needed: 2-3 nodes with replication

Storage per user entry:
  Key: "rate:{user_id}:{window}" = ~50 bytes
  Value: counter (8 bytes) + TTL
  Total: ~60 bytes per active user per window

For 1M active users: 60MB per window
  With 2 windows (sliding): ~120MB (fits in single Redis instance)

Step 3: Core Algorithms

Algorithm 1: Fixed Window Counter

Simplest approach. Count requests in fixed time windows (per minute, per hour).

import redis
import time

class FixedWindowRateLimiter:
    """
    Simple, low memory. Problem: burst at window boundary.
    A user can make 200% of limit in 2 seconds straddling window boundary.
    Time: O(1)  Space: O(users * windows)
    """
    def __init__(self, redis_client, limit, window_seconds):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds

    def is_allowed(self, user_id):
        window_id = int(time.time()) // self.window
        key = f"rate:{user_id}:{window_id}"

        count = self.redis.incr(key)
        if count == 1:
            self.redis.expire(key, self.window * 2)

        return count <= self.limit

Problem: A user makes 1000 requests at 0:59:59 (limit) and 1000 at 1:00:01. Both windows see 1000, but the user made 2000 in a 2-second period.

Algorithm 2: Sliding Window Log

Store timestamp of every request. On each new request, count how many are within the last window.

class SlidingWindowLogRateLimiter:
    """
    Accurate but high memory: O(requests) per user.
    Time: O(log n) for sorted set operations  Space: O(limit * users)
    """
    def __init__(self, redis_client, limit, window_seconds):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds

    def is_allowed(self, user_id):
        now = time.time()
        window_start = now - self.window
        key = f"rate_log:{user_id}"

        pipe = self.redis.pipeline()
        pipe.zremrangebyscore(key, 0, window_start)  # remove old entries
        pipe.zadd(key, {str(now): now})              # add current request
        pipe.zcard(key)                               # count remaining
        pipe.expire(key, self.window)
        _, _, count, _ = pipe.execute()

        return count <= self.limit

Algorithm 3: Sliding Window Counter (Recommended)

Approximate sliding window using two fixed windows and linear interpolation.

class SlidingWindowCounterRateLimiter:
    """
    Best production balance: approximate accuracy, O(1) per user.
    Uses: current window count + weighted previous window count.

    Approximation: current_count + prev_count * (1 - elapsed_fraction)
    Time: O(1)  Space: O(users)
    """
    def __init__(self, redis_client, limit, window_seconds):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds

    def is_allowed(self, user_id):
        now = time.time()
        current_window = int(now) // self.window
        prev_window = current_window - 1
        elapsed_in_window = now - (current_window * self.window)

        curr_key = f"rate:{user_id}:{current_window}"
        prev_key = f"rate:{user_id}:{prev_window}"

        pipe = self.redis.pipeline()
        pipe.get(curr_key)
        pipe.get(prev_key)
        curr_count, prev_count = pipe.execute()

        curr_count = int(curr_count or 0)
        prev_count = int(prev_count or 0)

        # Weighted approximation
        elapsed_fraction = elapsed_in_window / self.window
        approx_count = curr_count + prev_count * (1 - elapsed_fraction)

        if approx_count >= self.limit:
            return False

        pipe2 = self.redis.pipeline()
        pipe2.incr(curr_key)
        pipe2.expire(curr_key, self.window * 2)
        pipe2.execute()

        return True

Algorithm 4: Token Bucket

Fill a bucket with tokens at a fixed rate. Each request consumes one token. Allows bursting.

class TokenBucketRateLimiter:
    """
    Supports bursting. More complex Redis implementation.
    Time: O(1)  Space: O(users)
    """
    def __init__(self, redis_client, capacity, refill_rate):
        self.redis = redis_client
        self.capacity = capacity     # max tokens
        self.refill_rate = refill_rate  # tokens per second

    def is_allowed(self, user_id):
        key = f"token_bucket:{user_id}"
        now = time.time()

        # Lua script for atomicity
        lua_script = """
        local key = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local refill_rate = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])

        local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
        local tokens = tonumber(bucket[1]) or capacity
        local last_refill = tonumber(bucket[2]) or now

        -- Refill based on time elapsed
        local elapsed = now - last_refill
        tokens = math.min(capacity, tokens + elapsed * refill_rate)

        if tokens >= 1 then
            tokens = tokens - 1
            redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
            redis.call('EXPIRE', key, 3600)
            return 1
        else
            return 0
        end
        """
        result = self.redis.eval(lua_script, 1, key, self.capacity, self.refill_rate, now)
        return result == 1

Algorithm 5: Leaky Bucket

The leaky bucket models requests as water entering a bucket that leaks at a constant rate. Unlike the token bucket (which allows bursts), the leaky bucket enforces a smooth, constant output rate. Requests that arrive when the bucket is full are dropped.

class LeakyBucketRateLimiter:
    """
    Smooths traffic to a constant outflow rate. No bursts allowed.
    Implemented as a queue draining at a fixed rate.
    Time: O(1)  Space: O(users)
    """
    def __init__(self, redis_client, capacity, leak_rate):
        self.redis = redis_client
        self.capacity = capacity       # max queued requests
        self.leak_rate = leak_rate     # requests drained per second

    def is_allowed(self, user_id):
        key = f"leaky:{user_id}"
        now = time.time()

        lua_script = """
        local key = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local leak_rate = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])

        local bucket = redis.call('HMGET', key, 'level', 'last_leak')
        local level = tonumber(bucket[1]) or 0
        local last_leak = tonumber(bucket[2]) or now

        -- Leak out water since last check
        local leaked = (now - last_leak) * leak_rate
        level = math.max(0, level - leaked)

        if level + 1 <= capacity then
            level = level + 1
            redis.call('HMSET', key, 'level', level, 'last_leak', now)
            redis.call('EXPIRE', key, 3600)
            return 1
        else
            redis.call('HMSET', key, 'level', level, 'last_leak', now)
            return 0
        end
        """
        result = self.redis.eval(lua_script, 1, key, self.capacity, self.leak_rate, now)
        return result == 1

The leaky bucket is the right choice when a downstream service requires a strictly constant request rate (for example, a payment processor that mandates no more than N transactions per second). The token bucket suits the opposite case: an API that wants to tolerate short bursts as long as the long-run average stays within limit.

Step 3b: Why Atomicity Matters Here

The single most common bug in a distributed rate limiter is a read-then-write race. Consider the naive sliding-window-counter flow: read the count, compare to the limit, then increment. Between the read and the increment, another request on a different server can read the same count and also pass the check. Under high concurrency, both requests are admitted even though only one slot was free, so the effective limit is breached.

Server A: GET count -> 999 (limit 1000), check passes
Server B: GET count -> 999 (limit 1000), check passes   <-- race
Server A: INCR -> 1000
Server B: INCR -> 1001   <-- over the limit

Two correct fixes. First, push the entire check-and-increment into a single Lua script that Redis executes atomically (Redis runs Lua single-threaded, so no interleaving is possible). The token bucket and leaky bucket above already do this. Second, for the simple counter, use INCR first and then compare: INCR is atomic and returns the post-increment value, so each request gets a unique slot number and you reject any request whose returned value exceeds the limit. The order matters: increment-then-check is race-free, check-then-increment is not.

Step 4: High-Level Architecture

Client Request
     |
     v
[API Gateway / Load Balancer]
     |
     |-----> [Rate Limiter Middleware]
     |              |
     |              v
     |        [Redis Cluster]
     |        (rate counters, TTL-based)
     |
     |  if allowed:
     v
[Backend Services]

Header response:
  X-RateLimit-Limit: 1000
  X-RateLimit-Remaining: 734
  X-RateLimit-Reset: 1686234060
  Retry-After: 45 (on 429)

Step 5: API Design

Rate Limit Configuration API:

POST /rate-limits
{
  "tier": "pro",
  "resource": "/api/v1/payments",
  "window_seconds": 60,
  "limit": 10000
}

Rate Limit Header in every response:
  HTTP/1.1 200 OK
  X-RateLimit-Limit: 1000
  X-RateLimit-Remaining: 734
  X-RateLimit-Reset: 1686234060

On rate limit exceeded:
  HTTP/1.1 429 Too Many Requests
  Retry-After: 45
  Content-Type: application/json
  {"error": "rate_limit_exceeded", "message": "Retry after 45 seconds"}

Step 6: Distributed Deployment

Single Redis: works for < 100K RPS. Single point of failure.

Redis Cluster (sharded): partition user keys across nodes.
  - Key: rate:{user_id} -> shard by hash(user_id)
  - Each shard handles subset of users
  - Failure of one shard affects only that partition

Redis Sentinel: master-replica with automatic failover.
  - Write to master, read replicas for read-heavy workloads
  - Acceptable: minor over-counting during failover

Consistent hashing for user assignment:
  - Adds/removes Redis nodes without full re-sharding
  - ~k/n keys reassigned when adding node (k = total keys, n = node count)

Step 7: Tradeoffs

Algorithm	Accuracy	Memory	Latency	Burst handling
Fixed Window	Low (boundary burst)	O(1)	O(1)	No
Sliding Window Log	Exact	O(limit * users)	O(log n)	No
Sliding Window Counter	~99% accurate	O(1)	O(1)	No
Token Bucket	Exact	O(1)	O(1)	Yes
Leaky Bucket	Exact	O(1)	O(1)	No (smooths)

Production recommendation: Sliding Window Counter for most APIs. Token Bucket when burst allowance is a product requirement.

Common Follow-up Questions

Q: How do you handle Redis failure? Use circuit breaker: if Redis is down, allow requests with a fallback local in-memory counter. Accept minor over-counting vs a complete outage.

Q: How do you rate limit across multiple data centers? Each DC maintains its own counters. Accept eventual consistency: a user in DC1 might temporarily exceed global limit by the amount in DC2. True global consistency requires cross-DC synchronization which adds latency. Most products accept this tradeoff.

Q: What if the same user hits different servers? All servers connect to the same Redis cluster. The rate limiter state is centralized in Redis, not in the application server.

Q: How do you keep the rate-limit check under 5ms? Co-locate the Redis cluster in the same availability zone as the API gateway so the network round trip stays under 1ms. Use a single pipelined Redis call (or one Lua script) per check rather than multiple sequential calls. Keep connection pools warm so no request pays connection-setup cost. The dominant cost is the network hop, so minimizing round trips is the whole game.

Q: How do you support different limits for different endpoints and tiers? Store the limit configuration in a config store (Redis hash or a config service) keyed by (tier, resource). The rate limiter reads the applicable limit at check time and caches it locally with a short TTL. Because limits change rarely and reads dominate, a 30-second local cache of the config eliminates a config lookup on the hot path while still picking up changes quickly.

Q: How do you prevent a thundering herd when many users reset at the same window boundary? Fixed windows reset all users simultaneously, which can spike traffic at the boundary. The sliding window counter avoids this because each user's weighting decays continuously rather than snapping to zero at a shared boundary. If you must use fixed windows, add a small random jitter to each user's window start so resets spread out over time.

Q: Where do you return the Retry-After value from? For a fixed or sliding window, Retry-After is the seconds remaining until the current window expires. For a token bucket, it is the time needed to refill one token: 1 divided by the refill rate. Returning an accurate Retry-After lets well-behaved clients back off precisely instead of hammering the endpoint.

Frequently Asked Questions

What are the main rate limiting algorithms?

Token Bucket (allows bursts up to bucket capacity), Leaky Bucket (smooths traffic to constant rate), Fixed Window Counter (simple but has edge burst problem), Sliding Window Log (accurate but memory-heavy), Sliding Window Counter (approximation that balances accuracy and memory).

Where should rate limiting be implemented in a distributed system?

Three options: client-side (unreliable, gameable), API Gateway layer (best for most cases - centralized, language-agnostic), and application service layer (needed when rate limits are per-business-entity not per-IP). Most production systems use API Gateway + Redis for distributed state.

How do you handle race conditions in distributed rate limiting?

Use Redis atomic operations: INCR (atomic counter increment), Lua scripts for compare-and-set, or Redis SETNX for distributed locks. The sliding window counter approach using two Redis keys (current window + previous window) is the most popular production pattern.

Sources and review notesreviewed 8 Jun 2026

Article-specific sources

Verification window

Page last edited 8 Jun 2026 by Aditya Sharma. A review date records an editorial edit, not a guarantee that every external fact is still current.

Evidence labels

Official notices, candidate reports, offer documents, and editorial practice questions carry different confidence levels. The visible source list lets you inspect the evidence instead of relying on a blanket verification badge.

Verification policy: /editorial-standards/. Found something incorrect? Submit a correction - we respond within 48 hours.

topic cluster

Sat this this year? Share your story, earn ₹500.

First-person experience reports help future candidates prep smarter. We pay verified contributors ₹500 via UPI per accepted story with byline.

Submit your story →

ready to practice?

Take a free timed mock test

Put what you learned into practice. Our mock tests match the 2026 pattern with timer, navigator, reveal, and score breakdown. No signup.

Start free mock test →

related guides

Topics & Practice

Share this guide

Twitter LinkedIn W WhatsApp

System Design: Rate Limiter 2026 [Full Design with Code]

Why Rate Limiter is a Canonical System Design Problem

Step 1: Requirements Clarification

Step 2: Capacity Estimation

Step 3: Core Algorithms

Algorithm 1: Fixed Window Counter

Algorithm 2: Sliding Window Log

Algorithm 3: Sliding Window Counter (Recommended)

Algorithm 4: Token Bucket

Algorithm 5: Leaky Bucket

Step 3b: Why Atomicity Matters Here

Step 4: High-Level Architecture

Step 5: API Design

Step 6: Distributed Deployment

Step 7: Tradeoffs

Common Follow-up Questions

Frequently Asked Questions

What are the main rate limiting algorithms?

Where should rate limiting be implemented in a distributed system?

How do you handle race conditions in distributed rate limiting?

More resources in Topics & Practice

Sat this this year? Share your story, earn ₹500.

Take a free timed mock test

System Design: Chat Application 2026 [WhatsApp/Slack Architecture]

System Design: TinyURL 2026 [Hash Collision, Vanity URLs, QR Codes]

System Design: URL Shortener 2026 [bit.ly Architecture Deep Dive]

System Design: Notification Service 2026 [Push, Email, SMS Architecture]

System Design: News Feed 2026 [Twitter/Instagram Architecture]

Share this guide

System Design: Rate Limiter 2026 [Full Design with Code]

Why Rate Limiter is a Canonical System Design Problem

Step 1: Requirements Clarification

Step 2: Capacity Estimation

Step 3: Core Algorithms

Algorithm 1: Fixed Window Counter

Algorithm 2: Sliding Window Log

Algorithm 3: Sliding Window Counter (Recommended)

Algorithm 4: Token Bucket

Algorithm 5: Leaky Bucket

Step 3b: Why Atomicity Matters Here

Step 4: High-Level Architecture

Step 5: API Design

Step 6: Distributed Deployment

Step 7: Tradeoffs

Common Follow-up Questions

Related Articles

Frequently Asked Questions

What are the main rate limiting algorithms?

Where should rate limiting be implemented in a distributed system?

How do you handle race conditions in distributed rate limiting?

More resources in Topics & Practice

Sat this this year? Share your story, earn ₹500.

Take a free timed mock test

System Design: Chat Application 2026 [WhatsApp/Slack Architecture]

System Design: TinyURL 2026 [Hash Collision, Vanity URLs, QR Codes]

System Design: URL Shortener 2026 [bit.ly Architecture Deep Dive]

System Design: Notification Service 2026 [Push, Email, SMS Architecture]

System Design: News Feed 2026 [Twitter/Instagram Architecture]

Share this guide