issue 117apr 27mmxxvi
est. 2017
Sun, 27 Apr 2026
vol. IX · no. 117
PapersAdda
placement intelligence, since 2017
640+ briefs · 24 campuses · by reservation
verified offers · sourced from r/developersIndia
razorpay₹65.00 LPA· iit-d · sde-1google₹54.00 LPA· iiit-h · swe-imicrosoft₹49.50 LPA· iit-b · sdeatlassian₹38.00 LPA· nit-w · sde-1amazon₹44.20 LPA· bits-p · sde-1uber₹42.00 LPA· iit-kgp · sde-1razorpay₹65.00 LPA· iit-d · sde-1google₹54.00 LPA· iiit-h · swe-imicrosoft₹49.50 LPA· iit-b · sdeatlassian₹38.00 LPA· nit-w · sde-1amazon₹44.20 LPA· bits-p · sde-1uber₹42.00 LPA· iit-kgp · sde-1

System Design: URL Shortener 2026 [bit.ly Architecture Deep Dive]

10 min read
Uncategorized
Updated: 8 Jun 2026
Aditya Sharma
Aditya's Edit

PapersAdda 2026 Placement Cycle

By Aditya Sharma·Founder & Editor, PapersAdda

What changed in 2026 drives

Mass-recruiter offer letters are flatter for 2026 batch - the 4-5 LPA ASE band has barely budged in three years while inflation eats real wages. Premium tracks (Digital, Pro, Elite, Specialist) are still where the differential lives, and they are entirely test-driven. If you are aiming higher than the default offer, the coding round is not optional pageantry - it is the entire interview.

What I'd actually study for this

  • 01Two solid coding-round answers (1 medium-hard DSA each, with edge-case discussion) > five half-baked ones
  • 02One real project you can defend end-to-end - file paths, design decisions, and what you would change
  • 03One DBMS schema you actually built (not a textbook ER diagram), with at least 3 join-heavy queries written from memory
  • 04Three behavioural STAR stories: failure recovered, conflict handled, ownership taken

Where most candidates trip up

The single biggest mistake is treating company-specific guides as primary prep and DSA as secondary. It is the opposite. Mass recruiters use the test as a filter, but premium tracks at every IT services company use coding to allocate offer band. Spend 70% of prep time on DSA + system fundamentals, 20% on company-specific patterns, 10% on HR rehearsal. Reverse that ratio and you collect the default offer.

Editorial commentary by Aditya Sharma · written for PapersAdda · not generated, not aggregated.

Last Updated: June 2026


Why URL Shortener is a Classic Entry-Level System Design

Candidates report URL shortener as one of the most common warmup system design questions, appearing in roughly 20-25% of FAANG rounds and frequently as the first question in a two-part interview. Based on public preparation resources and candidate-reported interview threads, it is approachable enough for freshers while exposing enough depth for senior engineers (distributed ID generation, redirect caching, analytics pipeline).


Step 1: Requirements

Functional requirements:

  • Shorten a long URL to a 7-character short code
  • Redirect short URL to original URL with minimal latency
  • Optional: custom aliases, expiry, click analytics

Non-functional requirements:

  • Scale: 100M URLs shortened per day, 10B redirects per day
  • Redirect latency: under 10ms for hot URLs (cached), under 100ms for cold
  • High availability: 99.99% uptime
  • Durability: no URL mappings lost

Step 2: Capacity Estimation

Write QPS: 100M URLs/day / 86,400 = 1,157 writes/sec = ~1.2K/sec
Read QPS:  10B redirects/day / 86,400 = 115,740 reads/sec = ~116K/sec
Read-to-write ratio: 100:1

URL storage per record:
  short_code: 7 bytes
  long_url: ~200 bytes avg
  metadata (user_id, created_at, expires_at, click_count): ~50 bytes
  Total per record: ~257 bytes = ~260 bytes

Storage for 5 years:
  100M/day * 365 days * 5 years = 182.5B URLs
  182.5B * 260 bytes = ~47.5TB (manageable with sharding)

Cache:
  20% of URLs generate 80% of traffic (Pareto principle)
  Hot URLs to cache: 20% of daily new = 20M records
  20M * 260 bytes = ~5.2GB per day (fits in Redis cluster)

Step 3: URL ID Generation

Option A: Base62 of Auto-increment ID (Recommended)

CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

def encode_base62(num):
    """
    Convert integer ID to base62 short code.
    Time: O(log62(n)) = O(1)
    """
    result = []
    while num:
        result.append(CHARS[num % 62])
        num //= 62
    return ''.join(reversed(result)).zfill(7)

def decode_base62(code):
    """Convert short code back to integer ID."""
    num = 0
    for c in code:
        num = num * 62 + CHARS.index(c)
    return num

# 7-char base62 supports up to 62^7 = 3.5 trillion URLs
print(encode_base62(1000000))    # "4c92" -> padded to 7 chars
print(encode_base62(999999999))  # "15FTGf"

Pros: Guaranteed unique, no collision check, deterministic. Cons: Sequential IDs are guessable (minor security concern).

Option B: MD5 Hash + Take First 7 Chars

import hashlib

def md5_short(long_url):
    h = hashlib.md5(long_url.encode()).hexdigest()
    # Collision probability: 62^7 = 3.5T codes, low for 100M URLs
    return h[:7]

Cons: Collisions possible; same URL maps to same code (which can be a feature or a bug).

Option C: Distributed Counter (for horizontal scaling)

For multiple URL service instances, use a ticket server (database sequence) or Redis atomic counter to assign unique IDs without inter-service coordination.

Ticket Server:
  POST /counter/next-range -> {start: 1000001, end: 1001000}
  URL service caches a range, uses IDs locally until exhausted
  Reduces DB calls to 1 per 1000 URL creations

Step 3b: API Design

Two endpoints carry the entire product. State them explicitly before diving into internals.

POST /api/v1/shorten
  Body: { long_url, custom_alias?, expires_at? }
  Returns: { short_url, short_code, expires_at }
  Idempotency: if the same long_url + user is submitted twice,
  return the existing short_code rather than minting a new one
  (optional, controlled by a dedupe flag).

GET /{short_code}
  The redirect endpoint. This is the hot path: ~116K req/sec.
  Returns: HTTP 301 or 302 with Location header.
  Must be served from cache for the vast majority of requests.

The redirect endpoint is deliberately not under /api. It sits at the domain root so the short URL itself is as short as possible (tinyurl.com/abc1234, not tinyurl.com/api/v1/abc1234). Every character in the path is a character the user has to share, so the redirect route lives at the root and the management API lives under /api/v1.

A design subtlety: the POST shorten endpoint should be idempotent per (user, long_url) when dedupe is enabled, otherwise a user clicking "shorten" twice mints two codes for the same destination, wasting code space and splitting analytics. The dedupe lookup uses a hash index on long_url.


Step 4: Database Schema

CREATE TABLE url_mappings (
    short_code    CHAR(7)      PRIMARY KEY,
    long_url      TEXT         NOT NULL,
    user_id       BIGINT,
    created_at    TIMESTAMP    DEFAULT NOW(),
    expires_at    TIMESTAMP,
    is_active     BOOLEAN      DEFAULT TRUE,
    INDEX (user_id, created_at DESC),
    INDEX (expires_at)           -- for cleanup job
);

CREATE TABLE click_analytics (
    click_id      BIGINT       PRIMARY KEY AUTO_INCREMENT,
    short_code    CHAR(7)      NOT NULL,
    clicked_at    TIMESTAMP    DEFAULT NOW(),
    country       VARCHAR(3),
    referrer      VARCHAR(512),
    user_agent    VARCHAR(256),
    INDEX (short_code, clicked_at DESC)
);

For analytics at scale, do not write click events synchronously to MySQL. Use Kafka + ClickHouse or similar OLAP store for click analytics.


Step 5: System Architecture

[Client Browser]
      |
      | GET /abc1234
      v
[CDN (CloudFront)]
      |  (cache 301 redirects for hot URLs)
      |
[Load Balancer]
      |
[Redirect Service (stateless, 10+ instances)]
      |
      |---> Redis Cache
      |     (hot URL cache, TTL = 24h)
      |
      |---> MySQL (primary + 2 read replicas)
      |
      |---> Kafka (async analytics events)
                |
                v
           [Analytics Consumer -> ClickHouse]

Step 6: URL Creation Flow

POST /api/shorten
{
  "long_url": "https://example.com/very/long/url",
  "custom_alias": "mylink",   // optional
  "expires_at": "2027-01-01"  // optional
}

1. Validate URL format (reject localhost, private IPs)
2. Check if custom_alias already taken (if provided)
3. Get next ID from ticket server or use MD5 hash
4. Encode to base62 short_code
5. Write to MySQL
6. Warm Redis cache
7. Return {"short_url": "https://short.ly/abc1234"}

Step 7: Redirect Flow

GET /abc1234

1. Check Redis: GET "url:abc1234"
   Hit  -> 302 redirect immediately (< 5ms)
   Miss -> continue

2. Query MySQL read replica: SELECT long_url WHERE short_code = 'abc1234'
   Found  -> cache in Redis (TTL 24h), redirect
   Not found -> 404

3. Async: publish click event to Kafka
   (do not block redirect for analytics)

4. Return HTTP 302 with Location: {long_url}
   Header: Cache-Control: no-cache (so browser does not cache 302)

301 vs 302 for redirects:

Use 302 (not cached by browser) when click analytics are needed. Use 301 (cached by browser) when you want to minimize server load and analytics are not required. Most URL shorteners for marketing use 302.


Step 8: Expiry Handling

Expired URLs should return 410 Gone, not 404.

Cleanup job (runs daily):
  SELECT short_code FROM url_mappings
  WHERE expires_at < NOW() AND is_active = TRUE
  LIMIT 10000;
  
  UPDATE url_mappings SET is_active = FALSE WHERE ...
  DELETE FROM redis cache

On redirect:
  If expires_at IS NOT NULL AND expires_at < NOW():
    Return 410 Gone
    Do NOT log click (expired URL)

Step 9: Scalability Considerations

Write path scaling:
  Single MySQL primary handles ~5K writes/sec
  At 1.2K writes/sec, one primary is sufficient
  For 10K+ writes/sec: shard by short_code hash

Read path scaling:
  Redis cache: hit rate roughly 90% for hot URLs
  Remaining 10% of 116K/sec = 11.6K DB reads/sec
  3 MySQL read replicas each handle ~4K reads/sec (sufficient)

Geographic distribution:
  Multi-region: replicate MySQL across regions
  CDN: cache 301 redirects at edge for globally hot URLs
  DNS-based routing: route users to nearest redirect service instance

Custom Domain Support

For enterprise users who want go.company.com/abc:

  • Store custom_domain in url_mappings
  • Wildcard DNS: *.company.com points to redirect service
  • On redirect request, lookup by (domain, short_code) pair

Why Expiry Returns 410 and Not 404

The distinction between 410 Gone and 404 Not Found matters for search engines and analytics. A 404 tells the client the resource never existed or the URL is wrong. A 410 tells the client the resource existed but has been permanently removed. For expired short URLs, returning 410 prevents search engines from re-crawling the dead URL repeatedly, and it lets analytics systems distinguish between "URL was never valid" and "URL existed but expired." Interviewers at product companies (Bitly, Rebrandly) specifically probe this distinction because it demonstrates awareness of HTTP semantics beyond the basics.


Sharding Strategy in Depth

Sharding a URL shortener is straightforward because the access pattern is almost entirely by short_code. Hash the short_code and take modulo of the shard count. A consistent hashing ring is overkill for a URL shortener because adding shards is infrequent and a short downtime for data migration is acceptable at most scales. For extremely high write rates (above 10K creates per second), the ticket server pattern from the distributed counter section becomes important: the ticket server assigns ID ranges to individual URL service instances, which then use their local range without contacting the DB for each ID. This brings write coordination overhead down to one DB call per thousand URL creations.


Analytics Pipeline Deep Dive

Click analytics is the part candidates most often design incorrectly by writing to the analytics table synchronously inside the redirect path. That is wrong: it adds a database write to the hottest endpoint in the system and couples redirect latency to analytics availability. If the analytics database is slow, every redirect slows down.

The correct design decouples the two. On redirect, the service fires a non-blocking event onto a message queue (Kafka) and returns the 302 immediately. A separate analytics consumer reads the stream and aggregates.

Redirect path (synchronous, must be fast):
  1. Look up long_url (cache, then DB)
  2. Return 302
  3. Fire-and-forget: produce {short_code, ts, country, referrer, ua}
     to Kafka topic "clicks" (does not block the response)

Analytics path (asynchronous, can be slow):
  4. Kafka consumer batches click events
  5. Aggregates into per-code counters (clicks/day, by country, by referrer)
  6. Writes rollups to an analytics store (ClickHouse or a time-series DB)

The raw click stream is high volume (10B/day) but the aggregates a user actually queries are small (clicks per day per code). Storing pre-aggregated rollups instead of every raw click keeps the analytics store small and dashboard queries fast. Raw events can be retained in cheap object storage for a retention window if detailed drill-down is needed, then expired.


Follow-up Questions Interviewers Ask

How do you prevent the same long URL from creating millions of duplicate codes? Maintain a hash index on long_url and, when dedupe is enabled, look up the existing code before minting a new one. The tradeoff is an extra index and a lookup on the write path, so dedupe is usually opt-in per request rather than always on.

What happens when the base62 counter approaches 62^7? At 3.5 trillion codes you would extend to 8-character codes, which gives 62^8 = 218 trillion. The encode function naturally produces longer codes as the integer grows, so no code change is needed beyond relaxing the 7-character pad. Plan the migration before you hit the ceiling so existing 7-char codes stay valid.

How do you handle a malicious user shortening a phishing URL? Run new long_urls through a safe-browsing check (Google Safe Browsing API or an internal blocklist) asynchronously after creation, and disable codes that resolve to flagged destinations. Doing this synchronously would slow creation, so the check runs just after the code is returned and flips is_active to false on a hit.

Why prefer 301 for free links and 302 for analytics links? A 301 is cached by the browser and by intermediaries, so subsequent clicks never reach your server, which slashes load but also means you cannot count those clicks. A 302 is not cached, so every click hits your server and can be logged. The choice is a direct tradeoff between server load and analytics fidelity.

How do you make redirects fast for a globally distributed audience? Cache hot 301 redirects at the CDN edge so a click in another continent resolves at the nearest edge node without crossing oceans to the origin. For 302 (analytics) links, route to the nearest regional redirect service, which holds a local Redis cache replicated from the primary.


Methodology applied to this articlelast verified 8 Jun 2026
Sources used
Public exam-pattern documents, official recruiter pages, and verified candidate reports on r/developersIndia and LinkedIn.
Verification window
Page last edited 8 Jun 2026 by Aditya Sharma. Numbers and patterns sanity-checked against the most recent 2026 cycle drives we tracked.
What we did NOT do
  • No fabricated salary numbers or success rates. If we quote a range, it's sourced.
  • No noun-substituted templates. This article was not generated by swapping company names in a stock prompt.
  • No paid placements, sponsored coaching links, or affiliate-shilled course pushes.
Verification policy: /editorial-standards/. Found something incorrect? Submit a correction - we respond within 48 hours.

Explore this topic cluster

More resources in Uncategorized

Use the category hub to browse similar questions, exam patterns, salary guides, and preparation resources related to this topic.

Paid contributor programme

Sat this this year? Share your story, earn ₹500.

First-person experience reports help future candidates prep smarter. We pay verified contributors ₹500 via UPI per accepted story - with byline.

Submit your story →

Ready to practice?

Take a free timed mock test

Put what you learned into practice. Our mock tests match the 2026 pattern with timer, navigator, reveal, and score breakdown. No signup.

Start Free Mock Test →

More from PapersAdda

Share this guide: