Top 26 DBMS Indexing Interview Questions & Answers (2026)

What changed in 2026 drives
Mass-recruiter offer letters are flatter for 2026 batch - the 4-5 LPA ASE band has barely budged in three years while inflation eats real wages. Premium tracks (Digital, Pro, Elite, Specialist) are still where the differential lives, and they are entirely test-driven. If you are aiming higher than the default offer, the coding round is not optional pageantry - it is the entire interview.
What I'd actually study for this
- 01Two solid coding-round answers (1 medium-hard DSA each, with edge-case discussion) > five half-baked ones
- 02One real project you can defend end-to-end - file paths, design decisions, and what you would change
- 03One DBMS schema you actually built (not a textbook ER diagram), with at least 3 join-heavy queries written from memory
- 04Three behavioural STAR stories: failure recovered, conflict handled, ownership taken
Where most candidates trip up
The single biggest mistake is treating company-specific guides as primary prep and DSA as secondary. It is the opposite. Mass recruiters use the test as a filter, but premium tracks at every IT services company use coding to allocate offer band. Spend 70% of prep time on DSA + system fundamentals, 20% on company-specific patterns, 10% on HR rehearsal. Reverse that ratio and you collect the default offer.
Editorial commentary by Aditya Sharma · written for PapersAdda · not generated, not aggregated.
Last Updated: June 2026 | Level: Beginner to Advanced | Format: Q&A with Diagrams, SQL, and Complexity Analysis
Indexing is tested extensively in DBMS rounds at backend SDE interviews. Candidates report that B+tree internals, index selection decisions, and EXPLAIN plan analysis appear at product companies including Google, Amazon, and Flipkart. Based on public preparation resources and candidate-reported accounts, questions on clustered vs non-clustered and covering indexes are common at mid-to-senior SDE rounds.
Table of Contents
- Index Basics (Q1-Q8)
- B-Tree and B+Tree (Q9-Q15)
- Index Types and Selection (Q16-Q22)
- Query Optimization (Q23-Q26)
Index Basics
Q1. What is a database index? Why is it needed? Easy
A database index is a separate data structure that enables fast lookups on one or more columns without scanning every row in a table.
Without index:
SELECT * FROM employees WHERE department = 'Engineering';
-- Full table scan: reads every row, checks every department value.
-- For 10 million rows: reads 10 million rows.
-- Time complexity: O(n)
With index on department:
Index lookup: find 'Engineering' in the B+tree index.
Index directly provides the row locations.
-- Reads only matching rows (e.g., 5000 Engineering employees).
-- Time complexity: O(log n) for lookup + O(k) for k matching rows.
Cost of indexes:
- Write overhead: every INSERT/UPDATE/DELETE must also update the index.
- Storage: indexes consume disk space (often 10-50% of table size).
- Memory: often cached in the buffer pool.
The trade-off: indexes speed up reads at the cost of writes and storage.
Q2. What is the difference between a clustered and non-clustered index? Medium
| Aspect | Clustered Index | Non-Clustered Index |
|---|---|---|
| Number per table | One (only one physical ordering) | Many (multiple separate index structures) |
| Physical storage | Table rows stored in index key order | Separate structure with pointers to rows |
| Leaf nodes contain | Actual row data | Index key + row pointer (RID or PK) |
| Range scan | Very efficient (rows physically ordered) | May require many random I/Os (row lookup per entry) |
| MySQL InnoDB | Primary key is always the clustered index | Secondary indexes store PK, then look up cluster |
| SQL Server | Clustered or heap table | All non-PK indexes |
Diagram:
Clustered index (InnoDB):
Leaf nodes of B+tree contain the actual rows:
[...| id=5, name=Alice, dept=Eng |...| id=6, name=Bob, dept=HR |...]
Non-clustered index (InnoDB secondary):
Leaf nodes contain: [dept_value | primary_key]
[...|dept=Eng, pk=5|...| dept=Eng, pk=12|...|dept=HR, pk=6|...]
To get full row: look up pk in clustered index (double lookup).
Q3. What is a composite index? What is the left-prefix rule? Medium
A composite index is an index on multiple columns.
CREATE INDEX idx_dept_salary ON employees(department, salary);
Left-prefix rule: The index can be used for queries that filter by the leftmost column(s). Skipping a column breaks the prefix.
| Query | Uses idx_dept_salary? |
|---|---|
WHERE department = 'Eng' | YES (leftmost column) |
WHERE department = 'Eng' AND salary > 50000 | YES (both columns) |
WHERE salary > 50000 | NO (skips department, leftmost column) |
WHERE department = 'Eng' AND salary = 70000 | YES |
WHERE department = 'Eng' OR salary > 50000 | PARTIAL (only the OR adds complexity) |
Why left-prefix rule exists: A composite index sorts by (department, salary). You can binary search on department. But salary order is only meaningful within a department partition; standalone salary query requires full index scan.
Q4. What is selectivity? How does it affect index effectiveness? Medium
Selectivity is the fraction of rows that match a condition. High selectivity = few matching rows. Low selectivity = many matching rows.
Selectivity = (distinct values in column) / (total rows)
Column: gender (M/F, 2 distinct values in 1M rows) -> Selectivity = 2/1,000,000 = 0.000002
Column: email (1M unique values in 1M rows) -> Selectivity = 1,000,000/1,000,000 = 1.0
Effect on index use:
- High selectivity (email): Index very effective. Query returns very few rows; B+tree lookup is efficient.
- Low selectivity (gender): Index often NOT used by optimizer. Scanning half the table via random I/Os is slower than a full sequential table scan.
Rule of thumb: MySQL optimizer typically uses an index if the query will return fewer than ~20-30% of rows. For queries returning > 30% of rows, a full table scan is often faster.
Cardinality: Database term for the number of distinct values. High cardinality (email) = high selectivity = good index candidate.
Q5. What is a covering index (index-only scan)? Medium
A covering index includes all columns needed by the query (both WHERE filter and SELECT projection columns). The query can be satisfied entirely from the index without touching the actual table.
-- Query:
SELECT name, salary FROM employees WHERE department = 'Engineering';
-- Regular index on department:
-- Step 1: Find 'Engineering' in index -> get row pointers
-- Step 2: Look up EACH row in the actual table to get name and salary
-- Two I/O operations per row.
-- Covering index: (department, name, salary)
CREATE INDEX idx_cover ON employees(department, name, salary);
-- Step 1: Find 'Engineering' in index
-- Leaf node already has name, salary. NO table lookup needed.
-- One I/O set total.
When to use covering indexes:
- Frequent queries that select a small, fixed set of columns.
- High-read, low-write tables (covering indexes add write overhead for all those extra columns).
MySQL EXPLAIN output: Look for Using index in the Extra column. This means an index-only scan is being used.
Q6. What is a partial index? Medium
A partial index (available in PostgreSQL, SQLite; not MySQL) indexes only a subset of rows that satisfy a WHERE condition.
-- Partial index: only indexes active orders (not completed/cancelled)
CREATE INDEX idx_active_orders ON orders(customer_id)
WHERE status = 'active';
-- This query benefits from the partial index:
SELECT * FROM orders WHERE customer_id = 42 AND status = 'active';
-- This query does NOT:
SELECT * FROM orders WHERE customer_id = 42 AND status = 'completed';
Advantages:
- Smaller index size (only active rows; may be 10% of total rows).
- Faster updates (completed orders don't need to update this index).
- More likely to fit in memory.
Use cases:
- Soft-deleted rows (
WHERE deleted_at IS NULL). - Active/pending work items.
- Specific status values that are queried far more than others.
Q7. What is the difference between a dense and sparse index? Medium
| Aspect | Dense Index | Sparse Index |
|---|---|---|
| Entries | One index entry per data record | One index entry per block/page of data |
| Lookup | Direct: find entry, access record | Indirect: find closest block entry, scan block |
| Space | More (one entry per row) | Less (one entry per block, e.g., 1/100 of dense) |
| Updates | More overhead (update index per row) | Less overhead |
| Used in | Non-clustered indexes, hash indexes | Clustered indexes on sorted data |
Dense index example:
Index: [key=1, ptr] [key=2, ptr] [key=3, ptr] ... one entry per row
Sparse index example:
Data blocks (sorted by key):
Block 1: rows 1-100
Block 2: rows 101-200
Block 3: rows 201-300
Sparse index: [key=1, ptr to Block 1] [key=101, ptr to Block 2] [key=201, ptr to Block 3]
To find row 150: find entry <=150 in index (entry 101), access Block 2, scan for 150.
Clustered indexes in most databases use sparse indexing at higher levels of the B+tree (internal nodes are sparse; leaf nodes are dense).
Q8. What are the ACID implications of indexes? Hard
Indexes must participate in ACID transactions:
Atomicity: If a transaction aborts, index changes made during the transaction must also be undone. WAL includes index modifications.
Consistency: Unique indexes enforce uniqueness constraints. If an INSERT violates a UNIQUE index, the insert is rejected, maintaining consistency.
Isolation: Index entries are locked just like data rows. Inserting a new row acquires an exclusive lock on the index entry. Reading via index acquires shared locks (or is handled via MVCC). Gap locks on B+tree indexes prevent phantom reads in some isolation levels.
Durability: Index changes are logged to WAL. On crash recovery, index state is restored along with data.
Performance implication: ACID-compliant index updates add write amplification. Every INSERT into a table with 5 indexes requires 6 write operations (1 table + 5 indexes), all logged to WAL.
B-Tree and B+Tree
Q9. What is a B-tree? Medium
A B-tree (Balanced Tree) is a self-balancing tree data structure designed for storage systems. Unlike binary trees, each node can have many children.
Properties of B-tree of order m:
- Every node has at most m children.
- Every non-leaf non-root node has at least ceil(m/2) children.
- Root has at least 2 children (unless it is a leaf).
- All leaves are at the same level (perfectly balanced).
- A node with k children contains k-1 keys.
Structure:
B-tree of order 5 (each node: 2-4 keys, 3-5 children):
[30 | 70]
/ | \
[10|20] [40|50|60] [80|90]
/ | | \ / | | | \ / | \
... ... ...
(data at each node)
Key property: Data is stored at ALL nodes (internal and leaf). This means to retrieve data, you may find it at any level.
Time complexity: Search, insert, delete: O(log n) where n is number of keys.
Q10. What is a B+tree? How does it differ from B-tree? Medium
A B+tree is a variant of B-tree where:
- All data records are stored ONLY at leaf nodes.
- Internal nodes store only keys (used for routing).
- Leaf nodes are linked in a doubly-linked list.
Comparison:
| Aspect | B-tree | B+tree |
|---|---|---|
| Data location | At all nodes | Only at leaf nodes |
| Internal nodes | Store keys + data pointers | Store only keys (routing) |
| Leaf links | None | Linked list (enables range scan) |
| Range queries | Slow (must traverse tree for each record) | Fast (scan leaf linked list) |
| Point queries | Slightly faster (may find data at internal node) | Slightly slower (always go to leaf) |
| Use in databases | Rarely | Dominant (MySQL, PostgreSQL, Oracle) |
B+tree structure:
[30 | 60] <- Internal (routing keys only)
/ | \
[10|20] [30|40|50] [60|70|80] <- Leaves (data here)
data data data data data data data data
| | |
+------------+---------------+ <- Linked list
Range query advantage:
Find all employees with salary BETWEEN 30000 AND 60000:
1. Find leaf node containing 30000 (log n time).
2. Scan linked list of leaves forward until key > 60000.
Total: O(log n + k) where k = number of matching records.
B-tree would require O(k log n) to fetch each record individually.
Q11. How does B+tree handle insertions? Hard
B+tree insertion algorithm:
1. Find the correct leaf node (binary search down the tree).
2. Insert the key-value pair into the leaf.
3. If leaf overflows (too many keys):
a. Split: create a new leaf with upper half of keys.
b. Promote the FIRST key of the new leaf (copy-up) to parent.
c. Update leaf linked list pointers.
4. If parent overflows after promotion:
a. Split the internal node: push middle key UP (push-up, not copy).
b. Repeat up the tree until no overflow or a new root is created.
Example (order 3: max 2 keys per node):
Initial leaf: [10, 20]
Insert 15: [10, 15, 20] <- OVERFLOW
Split: Left=[10,15], Right=[20], promote 20 to parent.
Parent was empty root: new root = [20]
Tree:
[20]
/ \
[10|15] [20] <- leaf 20 is copy-upped
Key difference from B-tree: B+tree "copies up" the separator key (leaf keeps it). B-tree "pushes up" (internal node gets the key, not leaf).
Q12. How does B+tree handle deletions? Hard
B+tree deletion algorithm:
1. Find the leaf containing the key.
2. Delete the key from the leaf.
3. If leaf underflows (too few keys, below ceil(m/2)-1):
Option A: BORROW from a sibling (redistribute).
Option B: MERGE with a sibling (if borrowing not possible).
4. If merge: remove separator key from parent.
5. If parent underflows: recurse up.
Borrow example:
Left sibling: [5, 10, 15] (has extra)
Current leaf: [20] (underflow after deleting 25)
Parent separator: [20]
After borrow: redistribute -- move 15 to current, update separator to 15.
Left: [5, 10], Current: [15, 20], Parent separator: [15]
Merge example:
Left sibling: [5, 10] (cannot borrow)
Current leaf: [20] (underflow after deleting 25)
Parent separator: [20]
After merge: combine into [5, 10, 20], remove separator from parent.
Parent separator [20] removed. If parent underflows: repeat.
Amortized cost: Insertions and deletions are O(log n) amortized. Splits and merges are O(log n) worst case but rare in practice.
Q13. What is the height of a B+tree? How many I/Os does a lookup take? Medium
B+tree height:
For n total keys and order m (max m children per node):
Height h = O(log_m(n))
Each node holds up to m-1 keys.
Maximum keys at height h: (m-1) * m^h
For n=1 million keys, m=200 (page size 16KB, key+pointer ~80 bytes):
h = ceil(log_200(1,000,000)) = ceil(log(1M)/log(200)) = ceil(3.93) = 4
B+tree of height 4 with branching factor 200 holds hundreds of millions of keys.
I/Os per lookup:
Each node is one disk page.
Lookup = traverse from root to leaf = h I/Os.
With height 4: 4 disk I/Os per point query.
With root and upper levels cached in buffer pool: often 1-2 I/Os in practice.
Why large branching factor matters:
- Higher branching factor = lower tree height = fewer I/Os per lookup.
- Database designers tune page size (InnoDB: 16KB default) to maximize keys per node.
Q14. What is a B+tree fill factor? Medium
Fill factor is the percentage of each index page that is filled during an initial index build.
Fill factor = 70% means each index page starts 70% full.
30% is reserved for future insertions without immediate page splits.
High fill factor (90-100%):
- Maximum space efficiency.
- More keys per page, fewer pages, lower height.
- Future inserts cause more page splits (worse write performance for insertions into existing ranges).
Low fill factor (50-70%):
- More space reserved for future inserts.
- More pages, higher height.
- Fewer page splits as the index grows.
- Better write performance for insert-heavy workloads.
SQL Server example:
CREATE INDEX idx_orders ON orders(order_date)
WITH FILLFACTOR = 80; -- 80% filled, 20% reserved
PCTFREE (Oracle): Same concept, but expressed as percentage of free space to reserve (PCTFREE=20 means fill to 80%).
Q15. What is a hash index? When is it better than B+tree? Medium
A hash index uses a hash function to map keys to buckets containing row pointers.
Structure:
hash(key) -> bucket_number -> [pointer, pointer, ...]
| Aspect | Hash Index | B+tree Index |
|---|---|---|
| Point lookup (=) | O(1) average | O(log n) |
| Range query (BETWEEN, >, <) | Not supported | O(log n + k) |
| Prefix queries (LIKE 'abc%') | Not supported | Supported |
| ORDER BY | Not supported | Supported (leaves are ordered) |
| Insert/delete | O(1) average | O(log n) |
| Memory vs disk | Better for in-memory | Better for disk-based |
When to use hash index:
- Exact equality lookups only (
WHERE id = 42,WHERE email = '[email protected]'). - In-memory tables or memory-optimized workloads.
- No range queries needed.
MySQL: MEMORY storage engine uses hash indexes by default. InnoDB's adaptive hash index is a dynamic in-memory hash built automatically on frequently accessed B+tree entries (hidden from the user).
PostgreSQL: Has explicit hash index type (CREATE INDEX ... USING HASH). Not WAL-logged before PG 10; write-safe only from PG 10+.
Index Types and Selection
Q16. What is a bitmap index? When is it used? Hard
A bitmap index uses one bit-vector per distinct value in the indexed column. Each bit represents whether a row has that value.
gender column (M/F), 5 rows:
Row: 1 2 3 4 5
Data: M F M M F
Bitmap for M: [1, 0, 1, 1, 0]
Bitmap for F: [0, 1, 0, 0, 1]
AND/OR operations for complex queries:
SELECT * FROM employees WHERE gender='M' AND department='Engineering';
Bitmap for M: [1, 0, 1, 1, 0, 1, ...]
Bitmap for Engineering:[0, 0, 1, 0, 0, 1, ...]
AND result: [0, 0, 1, 0, 0, 1, ...] <- rows 3 and 6 match
When to use:
- Low cardinality columns (few distinct values: gender, status, country, boolean flags).
- Read-heavy data warehouse / OLAP workloads.
- Queries combining multiple low-cardinality conditions.
Why NOT for OLTP:
- Row-level locking is difficult (one update may affect bits across the entire vector).
- Oracle: bitmap index pages locked at block level for updates -- high contention for DML-heavy tables.
- PostgreSQL: does not support persistent bitmap indexes (only in-memory bitmap scans during query execution).
Q17. What is an index scan vs a table scan? When does the optimizer choose each? Medium
| Scan Type | Method | When used |
|---|---|---|
| Full table scan (Seq scan) | Read every row in the table | No useful index, low selectivity, small table |
| Index scan | Traverse B+tree, look up rows | High selectivity, moderate result set |
| Index-only scan | Traverse B+tree, use index data (covering index) | All columns in query are in the index |
| Bitmap index scan | Collect row IDs from index, then fetch rows in heap order | Multiple indexes combined, moderate result set |
| Loose index scan | Skip-scan through distinct values | GROUP BY, DISTINCT with index on grouped column |
Optimizer cost model: The query optimizer estimates:
- Estimated rows returned (from table statistics).
- Cost of random I/O (index lookup per row) vs sequential I/O (full scan).
- Whether index fits in buffer pool.
When full scan beats index scan:
- Query returns > 20-30% of rows. Random I/O per row from index is slower than one sequential scan.
- Small table (< 1000 rows): full scan faster than B+tree traversal.
- No usable index for the predicate.
Q18. What is an index on expression (functional index)? Medium
A functional index (expression index) stores the result of an expression applied to column values, enabling indexed lookups on computed values.
-- Without functional index: full scan for case-insensitive search
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- With functional index:
CREATE INDEX idx_lower_email ON users(LOWER(email)); -- PostgreSQL
-- or in MySQL 8.0+:
CREATE INDEX idx_lower_email ON users((LOWER(email))); -- note double parentheses
-- Now this query uses the index:
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
Common use cases:
- Case-insensitive search.
- Extracted substrings (
LEFT(phone, 3)for area code prefix queries). - Date parts (
YEAR(created_at)for year-based filtering). - JSON field extraction (
data->>'status'in PostgreSQL).
Constraint: The query WHERE clause must use the exact same expression as the index definition. LOWER(email) and LOWER(TRIM(email)) are different expressions.
Q19. What is index cardinality and how do you view it? Easy
Cardinality in the context of indexes is the estimated number of distinct values in an indexed column. Higher cardinality = more unique values = better index selectivity.
-- MySQL: view index cardinality
SHOW INDEX FROM employees;
-- Output columns: Table, Key_name, Cardinality, ...
-- Cardinality is an estimate (updated by ANALYZE TABLE or auto-update stats)
-- PostgreSQL: view from pg_stats
SELECT attname, n_distinct FROM pg_stats WHERE tablename = 'employees';
-- n_distinct = estimated distinct values (negative means fraction of total rows)
-- Update statistics:
ANALYZE TABLE employees; -- MySQL
ANALYZE employees; -- PostgreSQL
Use in query planning: The optimizer uses cardinality estimates to determine:
- Which index to use when multiple are available.
- Whether to use an index at all (low cardinality = skip index).
- Join order in multi-table queries.
Stale statistics issue: If table data changes significantly after the last ANALYZE, cardinality estimates become inaccurate. The optimizer may make poor index choices. Run ANALYZE after large bulk inserts.
Q20. How do you use EXPLAIN to analyze index usage? Medium
EXPLAIN shows the query execution plan, revealing which indexes are used and estimated costs.
MySQL EXPLAIN:
EXPLAIN SELECT * FROM employees WHERE department = 'Engineering' AND salary > 50000;
+----+-------------+-----------+------+---------------+------------------+-------+-------+------+
| id | select_type | table | type | possible_keys | key | rows | Extra |
+----+-------------+-----------+------+---------------+------------------+-------+-------+------+
| 1 | SIMPLE | employees | ref | idx_dept | idx_dept | 500 | Using where |
+----+-------------+-----------+------+---------------+------------------+-------+-------+------+
Key EXPLAIN columns:
| Column | Meaning |
|---|---|
type | Join/scan type: ALL=full scan, index=full index scan, range=index range, ref=index lookup, const=PK lookup |
possible_keys | Indexes that could be used |
key | Index actually used (NULL = full scan) |
rows | Estimated rows to examine |
Extra | Using index = covering index; Using filesort = sort needed; Using temporary = temp table |
PostgreSQL EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT * FROM employees WHERE department = 'Engineering';
-- Output:
Index Scan using idx_dept on employees (cost=0.42..150.5 rows=500 width=80)
(actual time=0.032..1.2 rows=487 loops=1)
Index Cond: (department = 'Engineering')
Planning time: 0.5 ms
Execution time: 1.4 ms
EXPLAIN ANALYZE actually executes the query and shows actual vs estimated rows. If estimated vs actual rows differ significantly, statistics need updating.
Q21. What is an index merge? Hard
Index merge is a query optimization where the database uses multiple single-column indexes for a single query by merging their results.
-- Two separate indexes:
CREATE INDEX idx_dept ON employees(department);
CREATE INDEX idx_salary ON employees(salary);
-- Query:
SELECT * FROM employees WHERE department = 'Engineering' OR salary > 100000;
-- Index merge (union):
Step 1: Use idx_dept to find all Engineering row IDs.
Step 2: Use idx_salary to find all salary>100000 row IDs.
Step 3: Union the two result sets.
Step 4: Fetch actual rows.
Types of index merge:
| Type | Use case |
|---|---|
| Union | OR conditions (get rows matching either condition) |
| Intersection | AND conditions on separate indexes (get rows matching both) |
| Sort-union | Like union but sorts by row ID before fetching for better I/O locality |
When the optimizer prefers index merge over composite index: If no composite index (dept, salary) exists, index merge may be better than full scan. But a well-designed composite index is usually faster than index merge due to single B+tree traversal.
MySQL EXPLAIN output: Extra: Using union(idx_dept, idx_salary); Using where
Q22. What are the guidelines for when to create an index? Easy
Create an index when:
- Column is frequently used in WHERE clauses.
- Column is used in JOIN conditions (foreign key columns).
- Column is used in ORDER BY or GROUP BY (avoids filesort).
- Column has high cardinality (many distinct values).
- Table is large (small tables are often better served by full scan).
Avoid creating an index when:
- Column has very low cardinality (boolean, gender, status with 2-3 values).
- Table is small (< a few thousand rows; full scan is fast).
- Table has heavy write-only or write-dominant workload (indexes slow down writes).
- Column is rarely used in queries.
Index maintenance review:
-- MySQL: find unused indexes (queries not using them)
-- Use performance_schema.table_io_waits_summary_by_index_usage
SELECT object_name, index_name, count_read
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE count_read = 0 AND index_name != 'PRIMARY';
-- These indexes are write overhead without read benefit. Consider dropping.
Query Optimization
Q23. What is a query execution plan? How does the optimizer choose one? Medium
A query execution plan is the step-by-step procedure the database uses to execute a SQL query.
The optimizer's job:
- Parse the SQL into an internal representation.
- Generate multiple equivalent execution plans (different join orders, index choices).
- Estimate the cost of each plan (CPU, I/O, memory).
- Choose the plan with the lowest estimated cost.
Cost estimation uses:
- Table statistics (row counts, column cardinalities, histograms of value distributions).
- Index information (height, selectivity).
- Hardware model (I/O cost of disk pages, CPU cost per comparison).
Example -- two-table join:
SELECT e.name, d.dept_name
FROM employees e JOIN departments d ON e.dept_id = d.dept_id
WHERE d.location = 'Mumbai';
Plans considered:
1. Scan departments (Mumbai filter), nested loop join employees.
2. Scan employees (all), hash join departments.
3. Use index on d.location, then join employees.
Optimizer picks plan 3 if d.location index exists and selectivity is high.
Q24. What are the common join algorithms and when are they used? Hard
| Algorithm | Description | Best when |
|---|---|---|
| Nested Loop Join | For each row in outer, scan inner | Inner table small or has good index; low row counts |
| Hash Join | Build hash table on smaller relation, probe with larger | Large tables, no usable index, equality joins only |
| Sort-Merge Join | Sort both relations, merge sorted outputs | Both inputs already sorted or join on range; good for ORDER BY result |
| Index Nested Loop | Nested loop but use index to look up inner rows | Outer small, inner has index on join column |
Complexity:
| Algorithm | Time | Space |
|---|---|---|
| Nested Loop | O(n * m) | O(1) |
| Hash Join | O(n + m) | O(min(n,m)) for hash table |
| Sort-Merge | O(n log n + m log m) | O(n + m) for sort buffers |
| Index Nested Loop | O(n * log m) | O(1) |
MySQL: Uses nested loop join and hash join (since 8.0.18). No sort-merge join. PostgreSQL: Supports all three algorithms; optimizer chooses based on cost.
Q25. What is the N+1 query problem? How is it solved? Medium
The N+1 problem occurs when an application fetches a list of N objects and then issues one additional query per object to fetch related data.
-- Fetching 100 orders and their customer names:
Query 1: SELECT * FROM orders LIMIT 100; -- Returns 100 rows
Query 2: SELECT name FROM customers WHERE id=1; -- For order 1's customer
Query 3: SELECT name FROM customers WHERE id=5; -- For order 2's customer
...
Query 101: SELECT name FROM customers WHERE id=99; -- For order 100's customer
Total: 101 queries instead of 1.
Solutions:
1. JOIN:
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
LIMIT 100;
-- 1 query, 1 round trip.
2. Eager loading (ORMs):
# SQLAlchemy: N+1 problem
orders = session.query(Order).limit(100).all()
for order in orders:
print(order.customer.name) # Each .customer fires a SQL query!
# Fix with eager loading:
from sqlalchemy.orm import joinedload
orders = session.query(Order).options(joinedload(Order.customer)).limit(100).all()
# Single query with JOIN.
3. IN-clause batching:
-- Fetch customer IDs from orders, then batch lookup:
SELECT name FROM customers WHERE id IN (1, 5, 9, ..., 99);
-- 2 queries total instead of N+1.
Q26. How would you optimize a slow query? (Full interview answer) Hard
A structured approach for optimizing a slow query:
Step 1: Reproduce and measure.
EXPLAIN ANALYZE SELECT ... FROM ... WHERE ... ORDER BY ...;
Note: execution time, row counts, scan types.
Step 2: Check for full table scans.
- EXPLAIN shows
type=ALLorSeq Scan. - Check: is there an index on the WHERE/JOIN columns?
- If not: add an index on the high-selectivity filter column.
Step 3: Check index selectivity.
- Is the index being used but still slow?
- Check cardinality. Low cardinality index may be ignored by optimizer (or should be).
- Consider composite index to cover more of the WHERE clause.
Step 4: Check for covering index opportunity.
- Does the query SELECT a small set of columns?
- Add those columns to the index to enable index-only scan.
Step 5: Check ORDER BY / GROUP BY.
Using filesortin EXPLAIN Extra = no index for sort.- Add index on ORDER BY column(s) to eliminate sort.
Step 6: Check JOIN order and algorithm.
- For hash joins: ensure the smaller table is the build input.
- For nested loop: ensure the inner table has an index on the join column.
Step 7: Check query rewrite.
- Can a subquery be replaced with a JOIN?
- Can an OR be replaced with UNION (for better index use)?
- Are there redundant GROUP BY or DISTINCT operations?
Step 8: Check statistics freshness.
ANALYZE TABLE orders; -- Update statistics for accurate query planning
Step 9: Consider hardware/caching.
- Is the working set index fitting in the buffer pool?
- Increase
innodb_buffer_pool_size(MySQL) orshared_buffers(PostgreSQL).
FAQ
Q: Does an index guarantee faster query performance? No. An index helps only when the query's selectivity is high and the optimizer chooses to use it. For low-selectivity queries or small tables, indexes add overhead without benefit.
Q: How many indexes should a table have? There is no fixed rule. Typical OLTP tables have 3-7 indexes. Each index adds write overhead (INSERT/UPDATE/DELETE cost). Profile actual query patterns and add indexes for the most frequent, slow queries.
Q: What happens to indexes when you run ALTER TABLE? Adding or removing columns may require rebuilding indexes. ALTER TABLE on large tables can be very slow and lock the table (or use online DDL with tools like pt-online-schema-change to avoid locking).
Related PapersAdda guides:
Methodology applied to this articlelast verified 8 Jun 2026
- No fabricated salary numbers or success rates. If we quote a range, it's sourced.
- No noun-substituted templates. This article was not generated by swapping company names in a stock prompt.
- No paid placements, sponsored coaching links, or affiliate-shilled course pushes.
Explore this topic cluster
More resources in Interview Questions
Use the category hub to browse similar questions, exam patterns, salary guides, and preparation resources related to this topic.
Paid contributor programme
Sat this this year? Share your story, earn ₹500.
First-person experience reports help future candidates prep smarter. We pay verified contributors ₹500 via UPI per accepted story - with byline.
Submit your story →Ready to practice?
Take a free timed mock test
Put what you learned into practice. Our mock tests match the 2026 pattern with timer, navigator, reveal, and score breakdown. No signup.
Start Free Mock Test →More from PapersAdda
Accenture Interview Questions 2026 (with Answers for Freshers)
Capgemini Interview Questions 2026 (with Answers for Freshers)
HCLTech Interview Questions 2026 (TechBee + TGT, with Answers)
IBM Interview Questions 2026 (with Answers for Freshers)