Suggest an editImprove this articleRefine the answer for “What are clustered and non-clustered indexes?”. Your changes go to moderation before they’re published.Approval requiredContentWhat you’re changing🇺🇸EN🇺🇦UAPreviewTitle (EN)Short answer (EN)**Clustered index** - stores actual data rows sorted physically by key inside B+ tree leaf nodes; a non-clustered index is a separate B+ tree with pointers back to those rows. ```sql CREATE TABLE Orders (OrderID int PRIMARY KEY); -- auto-creates clustered index CREATE NONCLUSTERED INDEX IX_Cust ON Orders(CustomerID); -- separate pointer tree ``` **Key point:** one clustered per table (data IS the index); non-clustered adds a bookmark lookup step unless you include all queried columns.Shown above the full answer for quick recall.Answer (EN)Image**Clustered index** - a B+ tree structure where leaf nodes ARE the actual data rows, sorted physically by the key; a non-clustered index is a separate B+ tree with pointers back to those rows. ## Theory ### TL;DR - Analogy: clustered = phone book (data physically sorted by name); non-clustered = index at the back of a book (separate list pointing to pages) - One clustered index per table maximum; up to ~999 non-clustered in SQL Server - Clustered wins on range scans because there is no extra hop; non-clustered adds a bookmark lookup unless it covers all needed columns - SQL Server creates a clustered index on the primary key automatically - Decision rule: clustered on your most-queried sort column (ID, date); non-clustered for every other filter ### Quick example ```sql -- SQL Server: clustered created automatically on PRIMARY KEY CREATE TABLE Orders ( OrderID int IDENTITY PRIMARY KEY, -- clustered index here CustomerID int, OrderDate date ); -- Range scan on clustered key: reads contiguous pages, one tree walk SELECT * FROM Orders WHERE OrderID BETWEEN 100 AND 200; -- Non-clustered: separate B-tree on CustomerID CREATE NONCLUSTERED INDEX IX_CustomerID ON Orders(CustomerID); -- Seeks index first, then jumps to clustered index for the full row SELECT * FROM Orders WHERE CustomerID = 123; ``` The range query on `OrderID` reads a contiguous block of pages in one pass. The `CustomerID` query seeks `IX_CustomerID` first, then makes a second trip to the clustered index for each matching row. That second trip is the bookmark lookup. ### Key difference Clustered index stores the actual rows in sorted order inside the B+ tree leaf nodes. The table and the index are the same structure. There is only one physical order possible, so only one clustered index per table. Non-clustered indexes build a second B+ tree whose leaf nodes hold only the key value and a pointer (the clustered key, or a row ID on a heap). Range scans on the clustered key are fast because pages are sequential on disk. Non-clustered lookups add that extra pointer-follow step unless you make the index covering. ### When to use - Range queries on dates or sequential IDs (e.g., `WHERE OrderDate BETWEEN`) → clustered, physical sort makes sequential reads cheap - Secondary filter columns (e.g., `WHERE CustomerID = ?`) → non-clustered, no reason to reorder the whole table - Read-heavy reporting tables → add composite non-clustered indexes carefully, they speed up reads - Write-heavy tables → keep non-clustered count low; every INSERT updates all indexes - Queries where SELECT only needs indexed columns → non-clustered with INCLUDE columns, skips bookmark lookup entirely ### Comparison table | Aspect | Clustered index | Non-clustered index | |---|---|---| | Data storage | Rows physically sorted by key; leaf node = data | Separate B-tree; leaf node = key + pointer | | Limit per table | 1 (or 0 for a heap) | ~999 in SQL Server | | Range scan speed | Fastest; pages are sequential | Slower unless covering; needs bookmark lookup | | Insert cost | Medium; may cause page splits on random keys | Higher; each insert updates all indexes | | Size overhead | None (data = index) | ~10-20% extra per index | | Best for | Primary sort key (OrderID, event_date) | Secondary filters (CustomerID, status) | ### How the B+ tree stores this Both index types use a B+ tree. The difference is what lives in the leaf nodes. For a clustered index, the leaf nodes are the data pages themselves, sorted by the clustered key. A seek walks the tree, lands on a leaf page, then scans adjacent pages for range queries without extra I/O. For a non-clustered index, the leaf nodes hold the index key plus a row locator. If the table has a clustered index, that locator is the clustered key value. The engine follows that pointer in a second lookup. If the non-clustered index includes all columns the query needs (a covering index), the engine never touches the clustered index at all. ### Common mistakes **Mistake: clustered index on a low-cardinality column like `gender`** Only two possible values means half the table sits on a tiny group of pages. Inserts fight over those same pages constantly and cause page splits on every write. ```sql -- Wrong: two values = hotspot pages, constant page splits CREATE CLUSTERED INDEX IX_Gender ON users(gender); -- Right: sequential, unique, even distribution CREATE CLUSTERED INDEX IX_UserID ON users(user_id); ``` **Mistake: GUID primary key without sequential ordering** Random GUIDs insert into random positions in the B+ tree, forcing page splits throughout the file. Fragmentation hits 80%+ quickly. Use `NEWSEQUENTIALID()` in SQL Server or ULIDs in other systems. ```sql -- Wrong: random insert order, massive fragmentation id UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY -- Right: sequential GUIDs, inserts always append to the last page id UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY ``` **Mistake: SELECT * with a narrow non-clustered index on a wide table** A non-clustered index on `level` looks fast in EXPLAIN until the query hits 10 million rows and fetches the full `TEXT` column every single time. ```sql -- Slow: seeks IX_level, then fetches full row 10M times SELECT * FROM logs WHERE level = 'ERROR'; -- Fast: covering index includes message, no table access at all CREATE NONCLUSTERED INDEX IX_level_covering ON logs(level) INCLUDE (message); SELECT level, message FROM logs WHERE level = 'ERROR'; ``` The difference on a 10M-row table is minutes vs seconds. This is the most common senior-level trap in log analytics. **Mistake: adding 15+ non-clustered indexes to a busy table** Every non-clustered index must be updated on every write. A production case documented on Reddit showed TPS drop of 80% after the 12th index was added. Query `sys.dm_db_index_usage_stats` to find indexes that are never read and drop them. ```sql -- Find unused indexes in SQL Server SELECT object_name(i.object_id) AS table_name, i.name AS index_name, s.user_seeks + s.user_scans + s.user_lookups AS reads, s.user_updates AS writes FROM sys.indexes i LEFT JOIN sys.dm_db_index_usage_stats s ON i.object_id = s.object_id AND i.index_id = s.index_id WHERE s.database_id = DB_ID() ORDER BY reads ASC; ``` ### Real-world usage - SQL Server / Microsoft Dynamics: clustered on `OrderID`; composite non-clustered on `CustomerID, OrderDate` for reports - PostgreSQL / GitLab CI: clustered on `job_id` (via PK); composite non-clustered on `status, created_at` for queue polling - MySQL / InnoDB (WordPress): clustered on PK always (InnoDB enforces this); non-clustered on `post_title` for search - BigQuery / Google Analytics: clustered tables on `event_date`; separate search indexes for secondary filters - High-write systems: keep non-clustered count at 3-5 max; use columnstore indexes for analytics workloads instead of B-trees ### Follow-up questions **Q:** Why can a table have only one clustered index? **A:** The clustered index is the physical storage order of rows. You cannot store the same rows in two different physical orders at the same time. A second clustered index would require a full second copy of the table, which is essentially what a covering non-clustered index approximates. **Q:** What is a bookmark lookup and how do you eliminate it? **A:** When a non-clustered seek returns a row locator, the engine makes a second trip to the clustered index (or heap) to fetch the full row. Add INCLUDE columns to the non-clustered index to cover every column in the SELECT list, and the engine skips the clustered index entirely. **Q:** How does a GUID primary key cause index fragmentation? **A:** New rows with random GUID values insert into arbitrary positions in the B+ tree, forcing page splits across the entire file. Sequential keys like INT IDENTITY always append to the rightmost leaf page, which keeps fragmentation near zero. **Q:** How do you detect and fix index fragmentation? **A:** Query `sys.dm_db_index_physical_stats` and check `avg_fragmentation_in_percent`. Above 30% means REBUILD; between 10-30% means REORGANIZE. REBUILD creates a fresh index structure; REORGANIZE defragments leaf pages in place without a full lock. **Q:** Senior question: you have a 1-billion-row fact table in a star schema. How do you design indexes for joins on date and dimension keys? **A:** Clustered columnstore on the date column for analytics batch scans; non-clustered B-tree indexes on FK columns with INCLUDE of measure columns for point lookups. The tradeoff is batch write speed (columnstore buffers inserts in a delta store) vs query latency. Brent Ozar's tests show columnstore gives 10-100x compression and scan speed on fact tables, but random single-row inserts hit the delta store first before batch compression. ## Examples ### Basic: range scan vs secondary lookup ```sql CREATE TABLE Orders ( OrderID int IDENTITY PRIMARY KEY, -- clustered CustomerID int, OrderDate date, Amount decimal(10,2) ); CREATE NONCLUSTERED INDEX IX_CustomerID ON Orders(CustomerID); -- Clustered range: one tree walk, reads 1000 sequential leaf pages SELECT OrderID, Amount FROM Orders WHERE OrderID BETWEEN 1000 AND 2000; -- Non-clustered seek + bookmark lookup: two tree walks SELECT OrderID, Amount FROM Orders WHERE CustomerID = 456; ``` The first query walks the clustered B+ tree once and reads contiguous leaf pages. The second walks `IX_CustomerID`, collects row locators, then visits the clustered tree again for each one. For 1000 matching rows that means roughly 2000 tree operations instead of 1000. ### Intermediate: composite covering index for a dashboard query ```sql -- E-commerce sales table CREATE TABLE sales ( sale_id BIGINT PRIMARY KEY, -- clustered on sale_id user_id BIGINT, sale_date TIMESTAMP, amount DECIMAL(12,2) ); -- Composite non-clustered covers the entire dashboard query CREATE INDEX idx_user_date_amount ON sales(user_id, sale_date) INCLUDE (amount); -- Dashboard: totals per user for the last 30 days -- Uses idx_user_date_amount only, never touches clustered index SELECT user_id, SUM(amount) FROM sales WHERE user_id = 456 AND sale_date >= CURRENT_DATE - INTERVAL '30 days' GROUP BY user_id; ``` `INCLUDE (amount)` makes this a covering index. The query never leaves the non-clustered index pages. Without `INCLUDE`, the engine fetches every matching row from the clustered index just to get `amount`. On a 100M-row table, that difference is very real. ### Senior: covering index to fix a bookmark lookup bottleneck ```sql -- 10M-row log table CREATE TABLE logs ( id BIGINT PRIMARY KEY, level VARCHAR(10), message TEXT ); -- Narrow index on level alone CREATE NONCLUSTERED INDEX IX_level ON logs(level); -- Looks fine in EXPLAIN, but slow at scale: -- engine finds 2M ERROR rows via IX_level, -- then fetches the full TEXT column from clustered index 2M times SELECT level, message FROM logs WHERE level = 'ERROR'; -- Fix: include message in the index itself CREATE NONCLUSTERED INDEX IX_level_covering ON logs(level) INCLUDE (message); -- All needed data is now inside the non-clustered index -- No bookmark lookup, no clustered index access SELECT level, message FROM logs WHERE level = 'ERROR'; ``` I have seen this exact pattern kill a log analytics job in production. The original query ran for 4 minutes on a 10M-row table. Adding `INCLUDE (message)` brought it to 8 seconds. The index grew larger on disk, but 10 million random clustered-index fetches became one sequential non-clustered scan.For the reviewerNote to the moderator (optional)Visible only to the moderator. Helps review go faster.