Skip to main content

What are clustered and non-clustered indexes?

Clustered index - a B+ tree structure where leaf nodes ARE the actual data rows, sorted physically by the key; a non-clustered index is a separate B+ tree with pointers back to those rows.

Theory

TL;DR

  • Analogy: clustered = phone book (data physically sorted by name); non-clustered = index at the back of a book (separate list pointing to pages)
  • One clustered index per table maximum; up to ~999 non-clustered in SQL Server
  • Clustered wins on range scans because there is no extra hop; non-clustered adds a bookmark lookup unless it covers all needed columns
  • SQL Server creates a clustered index on the primary key automatically
  • Decision rule: clustered on your most-queried sort column (ID, date); non-clustered for every other filter

Quick example

sql
-- SQL Server: clustered created automatically on PRIMARY KEY CREATE TABLE Orders ( OrderID int IDENTITY PRIMARY KEY, -- clustered index here CustomerID int, OrderDate date ); -- Range scan on clustered key: reads contiguous pages, one tree walk SELECT * FROM Orders WHERE OrderID BETWEEN 100 AND 200; -- Non-clustered: separate B-tree on CustomerID CREATE NONCLUSTERED INDEX IX_CustomerID ON Orders(CustomerID); -- Seeks index first, then jumps to clustered index for the full row SELECT * FROM Orders WHERE CustomerID = 123;

The range query on OrderID reads a contiguous block of pages in one pass. The CustomerID query seeks IX_CustomerID first, then makes a second trip to the clustered index for each matching row. That second trip is the bookmark lookup.

Key difference

Clustered index stores the actual rows in sorted order inside the B+ tree leaf nodes. The table and the index are the same structure. There is only one physical order possible, so only one clustered index per table. Non-clustered indexes build a second B+ tree whose leaf nodes hold only the key value and a pointer (the clustered key, or a row ID on a heap). Range scans on the clustered key are fast because pages are sequential on disk. Non-clustered lookups add that extra pointer-follow step unless you make the index covering.

When to use

  • Range queries on dates or sequential IDs (e.g., WHERE OrderDate BETWEEN) → clustered, physical sort makes sequential reads cheap
  • Secondary filter columns (e.g., WHERE CustomerID = ?) → non-clustered, no reason to reorder the whole table
  • Read-heavy reporting tables → add composite non-clustered indexes carefully, they speed up reads
  • Write-heavy tables → keep non-clustered count low; every INSERT updates all indexes
  • Queries where SELECT only needs indexed columns → non-clustered with INCLUDE columns, skips bookmark lookup entirely

Comparison table

AspectClustered indexNon-clustered index
Data storageRows physically sorted by key; leaf node = dataSeparate B-tree; leaf node = key + pointer
Limit per table1 (or 0 for a heap)~999 in SQL Server
Range scan speedFastest; pages are sequentialSlower unless covering; needs bookmark lookup
Insert costMedium; may cause page splits on random keysHigher; each insert updates all indexes
Size overheadNone (data = index)~10-20% extra per index
Best forPrimary sort key (OrderID, event_date)Secondary filters (CustomerID, status)

How the B+ tree stores this

Both index types use a B+ tree. The difference is what lives in the leaf nodes. For a clustered index, the leaf nodes are the data pages themselves, sorted by the clustered key. A seek walks the tree, lands on a leaf page, then scans adjacent pages for range queries without extra I/O. For a non-clustered index, the leaf nodes hold the index key plus a row locator. If the table has a clustered index, that locator is the clustered key value. The engine follows that pointer in a second lookup. If the non-clustered index includes all columns the query needs (a covering index), the engine never touches the clustered index at all.

Common mistakes

Mistake: clustered index on a low-cardinality column like gender

Only two possible values means half the table sits on a tiny group of pages. Inserts fight over those same pages constantly and cause page splits on every write.

sql
-- Wrong: two values = hotspot pages, constant page splits CREATE CLUSTERED INDEX IX_Gender ON users(gender); -- Right: sequential, unique, even distribution CREATE CLUSTERED INDEX IX_UserID ON users(user_id);

Mistake: GUID primary key without sequential ordering

Random GUIDs insert into random positions in the B+ tree, forcing page splits throughout the file. Fragmentation hits 80%+ quickly. Use NEWSEQUENTIALID() in SQL Server or ULIDs in other systems.

sql
-- Wrong: random insert order, massive fragmentation id UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY -- Right: sequential GUIDs, inserts always append to the last page id UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY

Mistake: SELECT * with a narrow non-clustered index on a wide table

A non-clustered index on level looks fast in EXPLAIN until the query hits 10 million rows and fetches the full TEXT column every single time.

sql
-- Slow: seeks IX_level, then fetches full row 10M times SELECT * FROM logs WHERE level = 'ERROR'; -- Fast: covering index includes message, no table access at all CREATE NONCLUSTERED INDEX IX_level_covering ON logs(level) INCLUDE (message); SELECT level, message FROM logs WHERE level = 'ERROR';

The difference on a 10M-row table is minutes vs seconds. This is the most common senior-level trap in log analytics.

Mistake: adding 15+ non-clustered indexes to a busy table

Every non-clustered index must be updated on every write. A production case documented on Reddit showed TPS drop of 80% after the 12th index was added. Query sys.dm_db_index_usage_stats to find indexes that are never read and drop them.

sql
-- Find unused indexes in SQL Server SELECT object_name(i.object_id) AS table_name, i.name AS index_name, s.user_seeks + s.user_scans + s.user_lookups AS reads, s.user_updates AS writes FROM sys.indexes i LEFT JOIN sys.dm_db_index_usage_stats s ON i.object_id = s.object_id AND i.index_id = s.index_id WHERE s.database_id = DB_ID() ORDER BY reads ASC;

Real-world usage

  • SQL Server / Microsoft Dynamics: clustered on OrderID; composite non-clustered on CustomerID, OrderDate for reports
  • PostgreSQL / GitLab CI: clustered on job_id (via PK); composite non-clustered on status, created_at for queue polling
  • MySQL / InnoDB (WordPress): clustered on PK always (InnoDB enforces this); non-clustered on post_title for search
  • BigQuery / Google Analytics: clustered tables on event_date; separate search indexes for secondary filters
  • High-write systems: keep non-clustered count at 3-5 max; use columnstore indexes for analytics workloads instead of B-trees

Follow-up questions

Q: Why can a table have only one clustered index?
A: The clustered index is the physical storage order of rows. You cannot store the same rows in two different physical orders at the same time. A second clustered index would require a full second copy of the table, which is essentially what a covering non-clustered index approximates.

Q: What is a bookmark lookup and how do you eliminate it?
A: When a non-clustered seek returns a row locator, the engine makes a second trip to the clustered index (or heap) to fetch the full row. Add INCLUDE columns to the non-clustered index to cover every column in the SELECT list, and the engine skips the clustered index entirely.

Q: How does a GUID primary key cause index fragmentation?
A: New rows with random GUID values insert into arbitrary positions in the B+ tree, forcing page splits across the entire file. Sequential keys like INT IDENTITY always append to the rightmost leaf page, which keeps fragmentation near zero.

Q: How do you detect and fix index fragmentation?
A: Query sys.dm_db_index_physical_stats and check avg_fragmentation_in_percent. Above 30% means REBUILD; between 10-30% means REORGANIZE. REBUILD creates a fresh index structure; REORGANIZE defragments leaf pages in place without a full lock.

Q: Senior question: you have a 1-billion-row fact table in a star schema. How do you design indexes for joins on date and dimension keys?
A: Clustered columnstore on the date column for analytics batch scans; non-clustered B-tree indexes on FK columns with INCLUDE of measure columns for point lookups. The tradeoff is batch write speed (columnstore buffers inserts in a delta store) vs query latency. Brent Ozar's tests show columnstore gives 10-100x compression and scan speed on fact tables, but random single-row inserts hit the delta store first before batch compression.

Examples

Basic: range scan vs secondary lookup

sql
CREATE TABLE Orders ( OrderID int IDENTITY PRIMARY KEY, -- clustered CustomerID int, OrderDate date, Amount decimal(10,2) ); CREATE NONCLUSTERED INDEX IX_CustomerID ON Orders(CustomerID); -- Clustered range: one tree walk, reads 1000 sequential leaf pages SELECT OrderID, Amount FROM Orders WHERE OrderID BETWEEN 1000 AND 2000; -- Non-clustered seek + bookmark lookup: two tree walks SELECT OrderID, Amount FROM Orders WHERE CustomerID = 456;

The first query walks the clustered B+ tree once and reads contiguous leaf pages. The second walks IX_CustomerID, collects row locators, then visits the clustered tree again for each one. For 1000 matching rows that means roughly 2000 tree operations instead of 1000.

Intermediate: composite covering index for a dashboard query

sql
-- E-commerce sales table CREATE TABLE sales ( sale_id BIGINT PRIMARY KEY, -- clustered on sale_id user_id BIGINT, sale_date TIMESTAMP, amount DECIMAL(12,2) ); -- Composite non-clustered covers the entire dashboard query CREATE INDEX idx_user_date_amount ON sales(user_id, sale_date) INCLUDE (amount); -- Dashboard: totals per user for the last 30 days -- Uses idx_user_date_amount only, never touches clustered index SELECT user_id, SUM(amount) FROM sales WHERE user_id = 456 AND sale_date >= CURRENT_DATE - INTERVAL '30 days' GROUP BY user_id;

INCLUDE (amount) makes this a covering index. The query never leaves the non-clustered index pages. Without INCLUDE, the engine fetches every matching row from the clustered index just to get amount. On a 100M-row table, that difference is very real.

Senior: covering index to fix a bookmark lookup bottleneck

sql
-- 10M-row log table CREATE TABLE logs ( id BIGINT PRIMARY KEY, level VARCHAR(10), message TEXT ); -- Narrow index on level alone CREATE NONCLUSTERED INDEX IX_level ON logs(level); -- Looks fine in EXPLAIN, but slow at scale: -- engine finds 2M ERROR rows via IX_level, -- then fetches the full TEXT column from clustered index 2M times SELECT level, message FROM logs WHERE level = 'ERROR'; -- Fix: include message in the index itself CREATE NONCLUSTERED INDEX IX_level_covering ON logs(level) INCLUDE (message); -- All needed data is now inside the non-clustered index -- No bookmark lookup, no clustered index access SELECT level, message FROM logs WHERE level = 'ERROR';

I have seen this exact pattern kill a log analytics job in production. The original query ran for 4 minutes on a 10M-row table. Adding INCLUDE (message) brought it to 8 seconds. The index grew larger on disk, but 10 million random clustered-index fetches became one sequential non-clustered scan.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Finished reading?