What are composite indexes?

General Questions~5 min read

Composite index - an index built on two or more columns, stored in a sorted structure the database traverses left-to-right during query execution.

Theory

TL;DR

An index on (a, b, c) helps queries that filter on a, on a + b, or on a + b + c. It does not help queries that filter only on b or c.
Column order is the single most consequential decision in composite index design.
When every column a query needs exists in the index, the engine skips the table row entirely. That is a covering index.
Put equality-filtered columns first, then range-filtered columns after.
One well-designed composite index usually replaces two or three single-column indexes without the extra write overhead.

Quick example

sql

-- users table: id, last_name, first_name, email, created_at

CREATE INDEX idx_users_name ON users (last_name, first_name);

-- Uses the index fully (both columns match)
SELECT * FROM users WHERE last_name = 'Kim' AND first_name = 'Alex';

-- Uses the index (leftmost column only)
SELECT * FROM users WHERE last_name = 'Kim';

-- Does NOT use the index (skips the first column)
SELECT * FROM users WHERE first_name = 'Alex';

The index stores rows sorted by last_name first, then by first_name within each last-name group. Without a value for last_name, the engine has no entry point into the structure. Scanning from the middle of a sorted tree is the same as a full table scan.

The leftmost prefix rule

A composite index on (col1, col2, col3) is three indexes in one: on col1, on (col1, col2), and on (col1, col2, col3). Those combinations only. It cannot serve a query on (col2) or (col2, col3) because the data is not sorted that way at the top level.

MySQL, PostgreSQL, and most relational databases implement this the same way. The query planner walks through the WHERE clause left-to-right, matches columns against the index definition, and stops as soon as it hits a column not present in the WHERE clause or encounters a range condition.

Range conditions break the chain. An equality check on col1 and a range on col2 means the index helps for both. But if you add AND col3 = 5, the engine cannot use the index for col3 because the range on col2 ended the usable sequence.

sql

CREATE INDEX idx_orders ON orders (status, created_at, customer_id);

-- Index used for all three columns (all equality)
SELECT * FROM orders
WHERE status = 'pending' AND created_at = '2024-01-15' AND customer_id = 42;

-- Index used for status and created_at range, NOT customer_id
SELECT * FROM orders
WHERE status = 'pending' AND created_at > '2024-01-01' AND customer_id = 42;

-- Index used for status only
SELECT * FROM orders WHERE status = 'pending';

Column ordering

Equality conditions should come before range conditions. Once the index encounters a range, it cannot do a direct lookup for the next column anymore.

For a query like WHERE user_id = 5 AND created_at > '2024-01-01', the index should be (user_id, created_at), not (created_at, user_id). With the reversed order, the engine can only use the index for the created_at range and must filter user_id manually across every row in that range.

Cardinality matters too. A column with thousands of distinct values (user_id) placed before a low-cardinality one (status) eliminates more rows at each step of the search. But this is secondary to the equality-before-range rule. When in doubt, put equality filters first.

Covering indexes

A covering index contains every column the query needs: filter columns, sort columns, and selected columns. When a query is fully covered, the engine reads only the index pages and never fetches the actual table row. On read-heavy workloads this can cut I/O by 80-90%.

sql

-- Without covering index: index scan + table row fetch for each match
CREATE INDEX idx_orders_status ON orders (status);
SELECT order_id, created_at FROM orders WHERE status = 'shipped';

-- With covering index: index only, no table access
CREATE INDEX idx_orders_cover ON orders (status, order_id, created_at);
SELECT order_id, created_at FROM orders WHERE status = 'shipped';
-- PostgreSQL EXPLAIN shows: "Index Only Scan" instead of "Index Scan"

The trade-off: wider indexes use more storage and slow down INSERT/UPDATE/DELETE because every write must also update the index.

Common mistakes

Wrong column order.

sql

-- Query: WHERE user_id = 5 AND status = 'active'

-- Bad: low-cardinality column first
CREATE INDEX idx_bad ON orders (status, user_id);

-- Good: high-cardinality equality column first
CREATE INDEX idx_good ON orders (user_id, status);

The first index still works, but it reads more rows before narrowing down to user_id = 5.

Filtering on a non-leftmost column.

You have an index on (a, b, c) and query WHERE b = 1. A full table scan follows. This is the most common composite index mistake in both production codebases and interview whiteboard problems.

Range condition in the middle.

sql

-- Index: (status, created_at, user_id)
-- The engine uses status (equality) and created_at (range)
-- but cannot use the index for user_id -- it filters that separately
SELECT * FROM orders
WHERE status = 'active' AND created_at > '2024-06-01' AND user_id = 99;

-- Fix: move user_id before created_at since it is an equality condition
CREATE INDEX idx_orders_fixed ON orders (status, user_id, created_at);

Adding too many columns as a precaution.

Indexes with 5+ columns rarely outperform a 3-column index, but every extra column increases index size and slows writes. Add columns only when you have a measured query to justify it.

Forgetting that NULL values are indexed. In PostgreSQL and MySQL, NULL is stored in the index. Queries with WHERE col IS NULL can use the index, and that is often unexpected behavior for developers who assume NULL skips indexing.

Real-world usage

E-commerce: (user_id, status) on an orders table for fetching all pending orders of a specific user.
Analytics dashboards: (event_type, created_at) to filter by event and sort by time in one index scan.
Auth systems: (email, is_active) for login queries that must check both conditions.
Multi-tenant SaaS: (tenant_id, resource_type, created_at) is often the first index added to almost every table.

Follow-up questions

Q: Can you use a composite index if your WHERE clause includes only the second column?
A: No. The leftmost prefix rule means the engine needs the first column to enter the index. A query on the second column alone results in a full table scan or a full index scan depending on the planner's cost estimate.

Q: What is the difference between a composite index and a covering index?
A: A composite index is any index on multiple columns. A covering index is a composite index that contains every column a specific query needs, so the engine never reads the actual table row. All covering indexes are composite, but not all composite indexes cover a given query.

Q: Does column order in SELECT affect index usage?
A: No. Column order in WHERE and in the index definition determines whether the index is used. The SELECT list matters only for covering indexes, where all selected columns must be present in the index. The order you list them in SELECT is irrelevant.

Q: How does the query planner decide whether to use a composite index?
A: It estimates the cost of each plan: index scan versus sequential scan. If the query returns a large fraction of rows, sequential scan is often cheaper because sequential I/O is faster than random I/O. Use EXPLAIN ANALYZE in PostgreSQL or EXPLAIN in MySQL to see the decision.

Q: You have queries on (a, b) and also on (a) separately. Do you need two indexes?
A: No. An index on (a, b) already serves queries on a alone via the leftmost prefix. A separate index on (a) wastes storage and adds write overhead. This is a classic over-indexing mistake in legacy codebases.

Examples

Filtering active user sessions

sql

-- sessions table: id, user_id, token, status, expires_at

CREATE INDEX idx_sessions_lookup ON sessions (user_id, status);

-- Fast: both columns indexed, user_id is leftmost
SELECT token, expires_at
FROM sessions
WHERE user_id = 1042 AND status = 'active';

-- Also uses the index (leftmost prefix rule)
SELECT COUNT(*) FROM sessions WHERE user_id = 1042;

-- Does NOT use the index
SELECT * FROM sessions WHERE status = 'active';
-- A separate index on (status) is needed for this query

I saw this exact pattern break in production when a team added a search feature that filtered only on status, not realizing the existing index was ordered the wrong way for that query pattern.

Covering index for a dashboard query

sql

-- orders table: id, tenant_id, status, total_amount, created_at

-- Without covering index: index scan + heap fetch for each matching row
CREATE INDEX idx_orders_tenant ON orders (tenant_id, status);

SELECT id, total_amount, created_at
FROM orders
WHERE tenant_id = 5 AND status = 'completed';

-- With covering index: engine reads only the index, never the table
CREATE INDEX idx_orders_cover
  ON orders (tenant_id, status, id, total_amount, created_at);

-- EXPLAIN ANALYZE shows "Index Only Scan" instead of "Index Scan"
-- Typical result on large tables: 3-5x faster read latency

The index grew from 2 to 5 columns. On a table with 10M rows, running this query 1,000 times per minute, the read gains outweighed the write overhead.

Range query column ordering

sql

-- logs table: id, service_name, severity, created_at

-- Good: equality columns first, range column last
CREATE INDEX idx_logs_good ON logs (service_name, severity, created_at);

SELECT * FROM logs
WHERE service_name = 'api-gateway'
  AND severity = 'error'
  AND created_at > NOW() - INTERVAL '1 hour';

-- Bad: range column in the middle breaks the chain for severity
CREATE INDEX idx_logs_bad ON logs (service_name, created_at, severity);

-- With idx_logs_bad: engine uses service_name + created_at range,
-- then filters severity = 'error' separately after the index scan

Short Answer

Interview ready

Premium

A concise answer to help you respond confidently on this topic during an interview.

Finished reading?