Suggest an edit

Improve this article

Refine the answer for “What are clustered and non-clustered indexes?”. Your changes go to moderation before they’re published.

Approval required

Content

What you’re changing

Title (EN)

Short answer (EN)

Shown above the full answer for quick recall.

Answer (EN)

**Clustered index** - a B+ tree structure where leaf nodes ARE the actual data rows, sorted physically by the key; a non-clustered index is a separate B+ tree with pointers back to those rows.

## Theory

### TL;DR

- Analogy: clustered = phone book (data physically sorted by name); non-clustered = index at the back of a book (separate list pointing to pages)
- One clustered index per table maximum; up to ~999 non-clustered in SQL Server
- Clustered wins on range scans because there is no extra hop; non-clustered adds a bookmark lookup unless it covers all needed columns
- SQL Server creates a clustered index on the primary key automatically
- Decision rule: clustered on your most-queried sort column (ID, date); non-clustered for every other filter

### Quick example

```sql
-- SQL Server: clustered created automatically on PRIMARY KEY
CREATE TABLE Orders (
    OrderID    int IDENTITY PRIMARY KEY,  -- clustered index here
    CustomerID int,
    OrderDate  date
);

-- Range scan on clustered key: reads contiguous pages, one tree walk
SELECT * FROM Orders WHERE OrderID BETWEEN 100 AND 200;

-- Non-clustered: separate B-tree on CustomerID
CREATE NONCLUSTERED INDEX IX_CustomerID ON Orders(CustomerID);

-- Seeks index first, then jumps to clustered index for the full row
SELECT * FROM Orders WHERE CustomerID = 123;
```

The range query on `OrderID` reads a contiguous block of pages in one pass. The `CustomerID` query seeks `IX_CustomerID` first, then makes a second trip to the clustered index for each matching row. That second trip is the bookmark lookup.

### Key difference

Clustered index stores the actual rows in sorted order inside the B+ tree leaf nodes. The table and the index are the same structure. There is only one physical order possible, so only one clustered index per table. Non-clustered indexes build a second B+ tree whose leaf nodes hold only the key value and a pointer (the clustered key, or a row ID on a heap). Range scans on the clustered key are fast because pages are sequential on disk. Non-clustered lookups add that extra pointer-follow step unless you make the index covering.

### When to use

- Range queries on dates or sequential IDs (e.g., `WHERE OrderDate BETWEEN`) → clustered, physical sort makes sequential reads cheap
- Secondary filter columns (e.g., `WHERE CustomerID = ?`) → non-clustered, no reason to reorder the whole table
- Read-heavy reporting tables → add composite non-clustered indexes carefully, they speed up reads
- Write-heavy tables → keep non-clustered count low; every INSERT updates all indexes
- Queries where SELECT only needs indexed columns → non-clustered with INCLUDE columns, skips bookmark lookup entirely

### Comparison table

| Aspect | Clustered index | Non-clustered index |
|---|---|---|
| Data storage | Rows physically sorted by key; leaf node = data | Separate B-tree; leaf node = key + pointer |
| Limit per table | 1 (or 0 for a heap) | ~999 in SQL Server |
| Range scan speed | Fastest; pages are sequential | Slower unless covering; needs bookmark lookup |
| Insert cost | Medium; may cause page splits on random keys | Higher; each insert updates all indexes |
| Size overhead | None (data = index) | ~10-20% extra per index |
| Best for | Primary sort key (OrderID, event_date) | Secondary filters (CustomerID, status) |

### How the B+ tree stores this

Both index types use a B+ tree. The difference is what lives in the leaf nodes. For a clustered index, the leaf nodes are the data pages themselves, sorted by the clustered key. A seek walks the tree, lands on a leaf page, then scans adjacent pages for range queries without extra I/O. For a non-clustered index, the leaf nodes hold the index key plus a row locator. If the table has a clustered index, that locator is the clustered key value. The engine follows that pointer in a second lookup. If the non-clustered index includes all columns the query needs (a covering index), the engine never touches the clustered index at all.

### Common mistakes

**Mistake: clustered index on a low-cardinality column like `gender`**

Only two possible values means half the table sits on a tiny group of pages. Inserts fight over those same pages constantly and cause page splits on every write.

```sql
-- Wrong: two values = hotspot pages, constant page splits
CREATE CLUSTERED INDEX IX_Gender ON users(gender);

-- Right: sequential, unique, even distribution
CREATE CLUSTERED INDEX IX_UserID ON users(user_id);
```

**Mistake: GUID primary key without sequential ordering**

Random GUIDs insert into random positions in the B+ tree, forcing page splits throughout the file. Fragmentation hits 80%+ quickly. Use `NEWSEQUENTIALID()` in SQL Server or ULIDs in other systems.

```sql
-- Wrong: random insert order, massive fragmentation
id UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY

-- Right: sequential GUIDs, inserts always append to the last page
id UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY
```

**Mistake: SELECT * with a narrow non-clustered index on a wide table**

A non-clustered index on `level` looks fast in EXPLAIN until the query hits 10 million rows and fetches the full `TEXT` column every single time.

```sql
-- Slow: seeks IX_level, then fetches full row 10M times
SELECT * FROM logs WHERE level = 'ERROR';

-- Fast: covering index includes message, no table access at all
CREATE NONCLUSTERED INDEX IX_level_covering
    ON logs(level) INCLUDE (message);
SELECT level, message FROM logs WHERE level = 'ERROR';
```

The difference on a 10M-row table is minutes vs seconds. This is the most common senior-level trap in log analytics.

**Mistake: adding 15+ non-clustered indexes to a busy table**

Every non-clustered index must be updated on every write. A production case documented on Reddit showed TPS drop of 80% after the 12th index was added. Query `sys.dm_db_index_usage_stats` to find indexes that are never read and drop them.

```sql
-- Find unused indexes in SQL Server
SELECT object_name(i.object_id) AS table_name,
       i.name AS index_name,
       s.user_seeks + s.user_scans + s.user_lookups AS reads,
       s.user_updates AS writes
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s
    ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE s.database_id = DB_ID()
ORDER BY reads ASC;
```

### Real-world usage

- SQL Server / Microsoft Dynamics: clustered on `OrderID`; composite non-clustered on `CustomerID, OrderDate` for reports
- PostgreSQL / GitLab CI: clustered on `job_id` (via PK); composite non-clustered on `status, created_at` for queue polling
- MySQL / InnoDB (WordPress): clustered on PK always (InnoDB enforces this); non-clustered on `post_title` for search
- BigQuery / Google Analytics: clustered tables on `event_date`; separate search indexes for secondary filters
- High-write systems: keep non-clustered count at 3-5 max; use columnstore indexes for analytics workloads instead of B-trees

### Follow-up questions

**Q:** Why can a table have only one clustered index?
**A:** The clustered index is the physical storage order of rows. You cannot store the same rows in two different physical orders at the same time. A second clustered index would require a full second copy of the table, which is essentially what a covering non-clustered index approximates.

**Q:** What is a bookmark lookup and how do you eliminate it?
**A:** When a non-clustered seek returns a row locator, the engine makes a second trip to the clustered index (or heap) to fetch the full row. Add INCLUDE columns to the non-clustered index to cover every column in the SELECT list, and the engine skips the clustered index entirely.

**Q:** How does a GUID primary key cause index fragmentation?
**A:** New rows with random GUID values insert into arbitrary positions in the B+ tree, forcing page splits across the entire file. Sequential keys like INT IDENTITY always append to the rightmost leaf page, which keeps fragmentation near zero.

**Q:** How do you detect and fix index fragmentation?
**A:** Query `sys.dm_db_index_physical_stats` and check `avg_fragmentation_in_percent`. Above 30% means REBUILD; between 10-30% means REORGANIZE. REBUILD creates a fresh index structure; REORGANIZE defragments leaf pages in place without a full lock.

**Q:** Senior question: you have a 1-billion-row fact table in a star schema. How do you design indexes for joins on date and dimension keys?
**A:** Clustered columnstore on the date column for analytics batch scans; non-clustered B-tree indexes on FK columns with INCLUDE of measure columns for point lookups. The tradeoff is batch write speed (columnstore buffers inserts in a delta store) vs query latency. Brent Ozar's tests show columnstore gives 10-100x compression and scan speed on fact tables, but random single-row inserts hit the delta store first before batch compression.

## Examples

### Basic: range scan vs secondary lookup

```sql
CREATE TABLE Orders (
    OrderID    int IDENTITY PRIMARY KEY,  -- clustered
    CustomerID int,
    OrderDate  date,
    Amount     decimal(10,2)
);

CREATE NONCLUSTERED INDEX IX_CustomerID ON Orders(CustomerID);

-- Clustered range: one tree walk, reads 1000 sequential leaf pages
SELECT OrderID, Amount
FROM Orders
WHERE OrderID BETWEEN 1000 AND 2000;

-- Non-clustered seek + bookmark lookup: two tree walks
SELECT OrderID, Amount
FROM Orders
WHERE CustomerID = 456;
```

The first query walks the clustered B+ tree once and reads contiguous leaf pages. The second walks `IX_CustomerID`, collects row locators, then visits the clustered tree again for each one. For 1000 matching rows that means roughly 2000 tree operations instead of 1000.

### Intermediate: composite covering index for a dashboard query

```sql
-- E-commerce sales table
CREATE TABLE sales (
    sale_id   BIGINT PRIMARY KEY,        -- clustered on sale_id
    user_id   BIGINT,
    sale_date TIMESTAMP,
    amount    DECIMAL(12,2)
);

-- Composite non-clustered covers the entire dashboard query
CREATE INDEX idx_user_date_amount
    ON sales(user_id, sale_date)
    INCLUDE (amount);

-- Dashboard: totals per user for the last 30 days
-- Uses idx_user_date_amount only, never touches clustered index
SELECT user_id, SUM(amount)
FROM sales
WHERE user_id = 456
  AND sale_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id;
```

`INCLUDE (amount)` makes this a covering index. The query never leaves the non-clustered index pages. Without `INCLUDE`, the engine fetches every matching row from the clustered index just to get `amount`. On a 100M-row table, that difference is very real.

### Senior: covering index to fix a bookmark lookup bottleneck

```sql
-- 10M-row log table
CREATE TABLE logs (
    id      BIGINT PRIMARY KEY,
    level   VARCHAR(10),
    message TEXT
);

-- Narrow index on level alone
CREATE NONCLUSTERED INDEX IX_level ON logs(level);

-- Looks fine in EXPLAIN, but slow at scale:
-- engine finds 2M ERROR rows via IX_level,
-- then fetches the full TEXT column from clustered index 2M times
SELECT level, message FROM logs WHERE level = 'ERROR';

-- Fix: include message in the index itself
CREATE NONCLUSTERED INDEX IX_level_covering
    ON logs(level)
    INCLUDE (message);

-- All needed data is now inside the non-clustered index
-- No bookmark lookup, no clustered index access
SELECT level, message FROM logs WHERE level = 'ERROR';
```

I have seen this exact pattern kill a log analytics job in production. The original query ran for 4 minutes on a 10M-row table. Adding `INCLUDE (message)` brought it to 8 seconds. The index grew larger on disk, but 10 million random clustered-index fetches became one sequential non-clustered scan.

Markdown · drag & drop images · ⌘B / ⌘I shortcuts1823 words

For the reviewer

Note to the moderator (optional)

Visible only to the moderator. Helps review go faster.