What are clustered and non-clustered indexes?
Clustered index - a B+ tree structure where leaf nodes ARE the actual data rows, sorted physically by the key; a non-clustered index is a separate B+ tree with pointers back to those rows.
Theory
TL;DR
- Analogy: clustered = phone book (data physically sorted by name); non-clustered = index at the back of a book (separate list pointing to pages)
- One clustered index per table maximum; up to ~999 non-clustered in SQL Server
- Clustered wins on range scans because there is no extra hop; non-clustered adds a bookmark lookup unless it covers all needed columns
- SQL Server creates a clustered index on the primary key automatically
- Decision rule: clustered on your most-queried sort column (ID, date); non-clustered for every other filter
Quick example
-- SQL Server: clustered created automatically on PRIMARY KEY
CREATE TABLE Orders (
OrderID int IDENTITY PRIMARY KEY, -- clustered index here
CustomerID int,
OrderDate date
);
-- Range scan on clustered key: reads contiguous pages, one tree walk
SELECT * FROM Orders WHERE OrderID BETWEEN 100 AND 200;
-- Non-clustered: separate B-tree on CustomerID
CREATE NONCLUSTERED INDEX IX_CustomerID ON Orders(CustomerID);
-- Seeks index first, then jumps to clustered index for the full row
SELECT * FROM Orders WHERE CustomerID = 123;The range query on OrderID reads a contiguous block of pages in one pass. The CustomerID query seeks IX_CustomerID first, then makes a second trip to the clustered index for each matching row. That second trip is the bookmark lookup.
Key difference
Clustered index stores the actual rows in sorted order inside the B+ tree leaf nodes. The table and the index are the same structure. There is only one physical order possible, so only one clustered index per table. Non-clustered indexes build a second B+ tree whose leaf nodes hold only the key value and a pointer (the clustered key, or a row ID on a heap). Range scans on the clustered key are fast because pages are sequential on disk. Non-clustered lookups add that extra pointer-follow step unless you make the index covering.
When to use
- Range queries on dates or sequential IDs (e.g.,
WHERE OrderDate BETWEEN) → clustered, physical sort makes sequential reads cheap - Secondary filter columns (e.g.,
WHERE CustomerID = ?) → non-clustered, no reason to reorder the whole table - Read-heavy reporting tables → add composite non-clustered indexes carefully, they speed up reads
- Write-heavy tables → keep non-clustered count low; every INSERT updates all indexes
- Queries where SELECT only needs indexed columns → non-clustered with INCLUDE columns, skips bookmark lookup entirely
Comparison table
| Aspect | Clustered index | Non-clustered index |
|---|---|---|
| Data storage | Rows physically sorted by key; leaf node = data | Separate B-tree; leaf node = key + pointer |
| Limit per table | 1 (or 0 for a heap) | ~999 in SQL Server |
| Range scan speed | Fastest; pages are sequential | Slower unless covering; needs bookmark lookup |
| Insert cost | Medium; may cause page splits on random keys | Higher; each insert updates all indexes |
| Size overhead | None (data = index) | ~10-20% extra per index |
| Best for | Primary sort key (OrderID, event_date) | Secondary filters (CustomerID, status) |
How the B+ tree stores this
Both index types use a B+ tree. The difference is what lives in the leaf nodes. For a clustered index, the leaf nodes are the data pages themselves, sorted by the clustered key. A seek walks the tree, lands on a leaf page, then scans adjacent pages for range queries without extra I/O. For a non-clustered index, the leaf nodes hold the index key plus a row locator. If the table has a clustered index, that locator is the clustered key value. The engine follows that pointer in a second lookup. If the non-clustered index includes all columns the query needs (a covering index), the engine never touches the clustered index at all.
Common mistakes
Mistake: clustered index on a low-cardinality column like gender
Only two possible values means half the table sits on a tiny group of pages. Inserts fight over those same pages constantly and cause page splits on every write.
-- Wrong: two values = hotspot pages, constant page splits
CREATE CLUSTERED INDEX IX_Gender ON users(gender);
-- Right: sequential, unique, even distribution
CREATE CLUSTERED INDEX IX_UserID ON users(user_id);Mistake: GUID primary key without sequential ordering
Random GUIDs insert into random positions in the B+ tree, forcing page splits throughout the file. Fragmentation hits 80%+ quickly. Use NEWSEQUENTIALID() in SQL Server or ULIDs in other systems.
-- Wrong: random insert order, massive fragmentation
id UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY
-- Right: sequential GUIDs, inserts always append to the last page
id UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEYMistake: SELECT * with a narrow non-clustered index on a wide table
A non-clustered index on level looks fast in EXPLAIN until the query hits 10 million rows and fetches the full TEXT column every single time.
-- Slow: seeks IX_level, then fetches full row 10M times
SELECT * FROM logs WHERE level = 'ERROR';
-- Fast: covering index includes message, no table access at all
CREATE NONCLUSTERED INDEX IX_level_covering
ON logs(level) INCLUDE (message);
SELECT level, message FROM logs WHERE level = 'ERROR';The difference on a 10M-row table is minutes vs seconds. This is the most common senior-level trap in log analytics.
Mistake: adding 15+ non-clustered indexes to a busy table
Every non-clustered index must be updated on every write. A production case documented on Reddit showed TPS drop of 80% after the 12th index was added. Query sys.dm_db_index_usage_stats to find indexes that are never read and drop them.
-- Find unused indexes in SQL Server
SELECT object_name(i.object_id) AS table_name,
i.name AS index_name,
s.user_seeks + s.user_scans + s.user_lookups AS reads,
s.user_updates AS writes
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s
ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE s.database_id = DB_ID()
ORDER BY reads ASC;Real-world usage
- SQL Server / Microsoft Dynamics: clustered on
OrderID; composite non-clustered onCustomerID, OrderDatefor reports - PostgreSQL / GitLab CI: clustered on
job_id(via PK); composite non-clustered onstatus, created_atfor queue polling - MySQL / InnoDB (WordPress): clustered on PK always (InnoDB enforces this); non-clustered on
post_titlefor search - BigQuery / Google Analytics: clustered tables on
event_date; separate search indexes for secondary filters - High-write systems: keep non-clustered count at 3-5 max; use columnstore indexes for analytics workloads instead of B-trees
Follow-up questions
Q: Why can a table have only one clustered index?
A: The clustered index is the physical storage order of rows. You cannot store the same rows in two different physical orders at the same time. A second clustered index would require a full second copy of the table, which is essentially what a covering non-clustered index approximates.
Q: What is a bookmark lookup and how do you eliminate it?
A: When a non-clustered seek returns a row locator, the engine makes a second trip to the clustered index (or heap) to fetch the full row. Add INCLUDE columns to the non-clustered index to cover every column in the SELECT list, and the engine skips the clustered index entirely.
Q: How does a GUID primary key cause index fragmentation?
A: New rows with random GUID values insert into arbitrary positions in the B+ tree, forcing page splits across the entire file. Sequential keys like INT IDENTITY always append to the rightmost leaf page, which keeps fragmentation near zero.
Q: How do you detect and fix index fragmentation?
A: Query sys.dm_db_index_physical_stats and check avg_fragmentation_in_percent. Above 30% means REBUILD; between 10-30% means REORGANIZE. REBUILD creates a fresh index structure; REORGANIZE defragments leaf pages in place without a full lock.
Q: Senior question: you have a 1-billion-row fact table in a star schema. How do you design indexes for joins on date and dimension keys?
A: Clustered columnstore on the date column for analytics batch scans; non-clustered B-tree indexes on FK columns with INCLUDE of measure columns for point lookups. The tradeoff is batch write speed (columnstore buffers inserts in a delta store) vs query latency. Brent Ozar's tests show columnstore gives 10-100x compression and scan speed on fact tables, but random single-row inserts hit the delta store first before batch compression.
Examples
Basic: range scan vs secondary lookup
CREATE TABLE Orders (
OrderID int IDENTITY PRIMARY KEY, -- clustered
CustomerID int,
OrderDate date,
Amount decimal(10,2)
);
CREATE NONCLUSTERED INDEX IX_CustomerID ON Orders(CustomerID);
-- Clustered range: one tree walk, reads 1000 sequential leaf pages
SELECT OrderID, Amount
FROM Orders
WHERE OrderID BETWEEN 1000 AND 2000;
-- Non-clustered seek + bookmark lookup: two tree walks
SELECT OrderID, Amount
FROM Orders
WHERE CustomerID = 456;The first query walks the clustered B+ tree once and reads contiguous leaf pages. The second walks IX_CustomerID, collects row locators, then visits the clustered tree again for each one. For 1000 matching rows that means roughly 2000 tree operations instead of 1000.
Intermediate: composite covering index for a dashboard query
-- E-commerce sales table
CREATE TABLE sales (
sale_id BIGINT PRIMARY KEY, -- clustered on sale_id
user_id BIGINT,
sale_date TIMESTAMP,
amount DECIMAL(12,2)
);
-- Composite non-clustered covers the entire dashboard query
CREATE INDEX idx_user_date_amount
ON sales(user_id, sale_date)
INCLUDE (amount);
-- Dashboard: totals per user for the last 30 days
-- Uses idx_user_date_amount only, never touches clustered index
SELECT user_id, SUM(amount)
FROM sales
WHERE user_id = 456
AND sale_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id;INCLUDE (amount) makes this a covering index. The query never leaves the non-clustered index pages. Without INCLUDE, the engine fetches every matching row from the clustered index just to get amount. On a 100M-row table, that difference is very real.
Senior: covering index to fix a bookmark lookup bottleneck
-- 10M-row log table
CREATE TABLE logs (
id BIGINT PRIMARY KEY,
level VARCHAR(10),
message TEXT
);
-- Narrow index on level alone
CREATE NONCLUSTERED INDEX IX_level ON logs(level);
-- Looks fine in EXPLAIN, but slow at scale:
-- engine finds 2M ERROR rows via IX_level,
-- then fetches the full TEXT column from clustered index 2M times
SELECT level, message FROM logs WHERE level = 'ERROR';
-- Fix: include message in the index itself
CREATE NONCLUSTERED INDEX IX_level_covering
ON logs(level)
INCLUDE (message);
-- All needed data is now inside the non-clustered index
-- No bookmark lookup, no clustered index access
SELECT level, message FROM logs WHERE level = 'ERROR';I have seen this exact pattern kill a log analytics job in production. The original query ran for 4 minutes on a 10M-row table. Adding INCLUDE (message) brought it to 8 seconds. The index grew larger on disk, but 10 million random clustered-index fetches became one sequential non-clustered scan.
Short Answer
Interview readyA concise answer to help you respond confidently on this topic during an interview.