What are clustered and non-clustered indexes?

General Questions~7 min read

Clustered index - a B+ tree structure where leaf nodes ARE the actual data rows, sorted physically by the key; a non-clustered index is a separate B+ tree with pointers back to those rows.

Theory

TL;DR

Analogy: clustered = phone book (data physically sorted by name); non-clustered = index at the back of a book (separate list pointing to pages)
One clustered index per table maximum; up to ~999 non-clustered in SQL Server
Clustered wins on range scans because there is no extra hop; non-clustered adds a bookmark lookup unless it covers all needed columns
SQL Server creates a clustered index on the primary key automatically
Decision rule: clustered on your most-queried sort column (ID, date); non-clustered for every other filter

Quick example

sql

-- SQL Server: clustered created automatically on PRIMARY KEY
CREATE TABLE Orders (
    OrderID    int IDENTITY PRIMARY KEY,  -- clustered index here
    CustomerID int,
    OrderDate  date
);

-- Range scan on clustered key: reads contiguous pages, one tree walk
SELECT * FROM Orders WHERE OrderID BETWEEN 100 AND 200;

-- Non-clustered: separate B-tree on CustomerID
CREATE NONCLUSTERED INDEX IX_CustomerID ON Orders(CustomerID);

-- Seeks index first, then jumps to clustered index for the full row
SELECT * FROM Orders WHERE CustomerID = 123;

The range query on OrderID reads a contiguous block of pages in one pass. The CustomerID query seeks IX_CustomerID first, then makes a second trip to the clustered index for each matching row. That second trip is the bookmark lookup.

Key difference

Clustered index stores the actual rows in sorted order inside the B+ tree leaf nodes. The table and the index are the same structure. There is only one physical order possible, so only one clustered index per table. Non-clustered indexes build a second B+ tree whose leaf nodes hold only the key value and a pointer (the clustered key, or a row ID on a heap). Range scans on the clustered key are fast because pages are sequential on disk. Non-clustered lookups add that extra pointer-follow step unless you make the index covering.

When to use

Range queries on dates or sequential IDs (e.g., WHERE OrderDate BETWEEN) → clustered, physical sort makes sequential reads cheap
Secondary filter columns (e.g., WHERE CustomerID = ?) → non-clustered, no reason to reorder the whole table
Read-heavy reporting tables → add composite non-clustered indexes carefully, they speed up reads
Write-heavy tables → keep non-clustered count low; every INSERT updates all indexes
Queries where SELECT only needs indexed columns → non-clustered with INCLUDE columns, skips bookmark lookup entirely

Comparison table

Aspect	Clustered index	Non-clustered index
Data storage	Rows physically sorted by key; leaf node = data	Separate B-tree; leaf node = key + pointer
Limit per table	1 (or 0 for a heap)	~999 in SQL Server
Range scan speed	Fastest; pages are sequential	Slower unless covering; needs bookmark lookup
Insert cost	Medium; may cause page splits on random keys	Higher; each insert updates all indexes
Size overhead	None (data = index)	~10-20% extra per index
Best for	Primary sort key (OrderID, event_date)	Secondary filters (CustomerID, status)

How the B+ tree stores this

Both index types use a B+ tree. The difference is what lives in the leaf nodes. For a clustered index, the leaf nodes are the data pages themselves, sorted by the clustered key. A seek walks the tree, lands on a leaf page, then scans adjacent pages for range queries without extra I/O. For a non-clustered index, the leaf nodes hold the index key plus a row locator. If the table has a clustered index, that locator is the clustered key value. The engine follows that pointer in a second lookup. If the non-clustered index includes all columns the query needs (a covering index), the engine never touches the clustered index at all.

Common mistakes

Mistake: clustered index on a low-cardinality column like gender

Only two possible values means half the table sits on a tiny group of pages. Inserts fight over those same pages constantly and cause page splits on every write.

sql

-- Wrong: two values = hotspot pages, constant page splits
CREATE CLUSTERED INDEX IX_Gender ON users(gender);

-- Right: sequential, unique, even distribution
CREATE CLUSTERED INDEX IX_UserID ON users(user_id);

Mistake: GUID primary key without sequential ordering

Random GUIDs insert into random positions in the B+ tree, forcing page splits throughout the file. Fragmentation hits 80%+ quickly. Use NEWSEQUENTIALID() in SQL Server or ULIDs in other systems.

sql

-- Wrong: random insert order, massive fragmentation
id UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY

-- Right: sequential GUIDs, inserts always append to the last page
id UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY

Mistake: SELECT * with a narrow non-clustered index on a wide table

A non-clustered index on level looks fast in EXPLAIN until the query hits 10 million rows and fetches the full TEXT column every single time.

sql

-- Slow: seeks IX_level, then fetches full row 10M times
SELECT * FROM logs WHERE level = 'ERROR';

-- Fast: covering index includes message, no table access at all
CREATE NONCLUSTERED INDEX IX_level_covering
    ON logs(level) INCLUDE (message);
SELECT level, message FROM logs WHERE level = 'ERROR';

The difference on a 10M-row table is minutes vs seconds. This is the most common senior-level trap in log analytics.

Mistake: adding 15+ non-clustered indexes to a busy table

Every non-clustered index must be updated on every write. A production case documented on Reddit showed TPS drop of 80% after the 12th index was added. Query sys.dm_db_index_usage_stats to find indexes that are never read and drop them.

sql

-- Find unused indexes in SQL Server
SELECT object_name(i.object_id) AS table_name,
       i.name AS index_name,
       s.user_seeks + s.user_scans + s.user_lookups AS reads,
       s.user_updates AS writes
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s
    ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE s.database_id = DB_ID()
ORDER BY reads ASC;

Real-world usage

SQL Server / Microsoft Dynamics: clustered on OrderID; composite non-clustered on CustomerID, OrderDate for reports
PostgreSQL / GitLab CI: clustered on job_id (via PK); composite non-clustered on status, created_at for queue polling
MySQL / InnoDB (WordPress): clustered on PK always (InnoDB enforces this); non-clustered on post_title for search
BigQuery / Google Analytics: clustered tables on event_date; separate search indexes for secondary filters
High-write systems: keep non-clustered count at 3-5 max; use columnstore indexes for analytics workloads instead of B-trees

Follow-up questions

Q: Why can a table have only one clustered index?
A: The clustered index is the physical storage order of rows. You cannot store the same rows in two different physical orders at the same time. A second clustered index would require a full second copy of the table, which is essentially what a covering non-clustered index approximates.

Q: What is a bookmark lookup and how do you eliminate it?
A: When a non-clustered seek returns a row locator, the engine makes a second trip to the clustered index (or heap) to fetch the full row. Add INCLUDE columns to the non-clustered index to cover every column in the SELECT list, and the engine skips the clustered index entirely.

Q: How does a GUID primary key cause index fragmentation?
A: New rows with random GUID values insert into arbitrary positions in the B+ tree, forcing page splits across the entire file. Sequential keys like INT IDENTITY always append to the rightmost leaf page, which keeps fragmentation near zero.

Q: How do you detect and fix index fragmentation?
A: Query sys.dm_db_index_physical_stats and check avg_fragmentation_in_percent. Above 30% means REBUILD; between 10-30% means REORGANIZE. REBUILD creates a fresh index structure; REORGANIZE defragments leaf pages in place without a full lock.

Q: Senior question: you have a 1-billion-row fact table in a star schema. How do you design indexes for joins on date and dimension keys?
A: Clustered columnstore on the date column for analytics batch scans; non-clustered B-tree indexes on FK columns with INCLUDE of measure columns for point lookups. The tradeoff is batch write speed (columnstore buffers inserts in a delta store) vs query latency. Brent Ozar's tests show columnstore gives 10-100x compression and scan speed on fact tables, but random single-row inserts hit the delta store first before batch compression.

Examples

Basic: range scan vs secondary lookup

sql

CREATE TABLE Orders (
    OrderID    int IDENTITY PRIMARY KEY,  -- clustered
    CustomerID int,
    OrderDate  date,
    Amount     decimal(10,2)
);

CREATE NONCLUSTERED INDEX IX_CustomerID ON Orders(CustomerID);

-- Clustered range: one tree walk, reads 1000 sequential leaf pages
SELECT OrderID, Amount
FROM Orders
WHERE OrderID BETWEEN 1000 AND 2000;

-- Non-clustered seek + bookmark lookup: two tree walks
SELECT OrderID, Amount
FROM Orders
WHERE CustomerID = 456;

The first query walks the clustered B+ tree once and reads contiguous leaf pages. The second walks IX_CustomerID, collects row locators, then visits the clustered tree again for each one. For 1000 matching rows that means roughly 2000 tree operations instead of 1000.

Intermediate: composite covering index for a dashboard query

sql

-- E-commerce sales table
CREATE TABLE sales (
    sale_id   BIGINT PRIMARY KEY,        -- clustered on sale_id
    user_id   BIGINT,
    sale_date TIMESTAMP,
    amount    DECIMAL(12,2)
);

-- Composite non-clustered covers the entire dashboard query
CREATE INDEX idx_user_date_amount
    ON sales(user_id, sale_date)
    INCLUDE (amount);

-- Dashboard: totals per user for the last 30 days
-- Uses idx_user_date_amount only, never touches clustered index
SELECT user_id, SUM(amount)
FROM sales
WHERE user_id = 456
  AND sale_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id;

INCLUDE (amount) makes this a covering index. The query never leaves the non-clustered index pages. Without INCLUDE, the engine fetches every matching row from the clustered index just to get amount. On a 100M-row table, that difference is very real.

Senior: covering index to fix a bookmark lookup bottleneck

sql

-- 10M-row log table
CREATE TABLE logs (
    id      BIGINT PRIMARY KEY,
    level   VARCHAR(10),
    message TEXT
);

-- Narrow index on level alone
CREATE NONCLUSTERED INDEX IX_level ON logs(level);

-- Looks fine in EXPLAIN, but slow at scale:
-- engine finds 2M ERROR rows via IX_level,
-- then fetches the full TEXT column from clustered index 2M times
SELECT level, message FROM logs WHERE level = 'ERROR';

-- Fix: include message in the index itself
CREATE NONCLUSTERED INDEX IX_level_covering
    ON logs(level)
    INCLUDE (message);

-- All needed data is now inside the non-clustered index
-- No bookmark lookup, no clustered index access
SELECT level, message FROM logs WHERE level = 'ERROR';

I have seen this exact pattern kill a log analytics job in production. The original query ran for 4 minutes on a 10M-row table. Adding INCLUDE (message) brought it to 8 seconds. The index grew larger on disk, but 10 million random clustered-index fetches became one sequential non-clustered scan.

Short Answer

Interview ready

Premium

A concise answer to help you respond confidently on this topic during an interview.

Finished reading?