Skip to main content

What is the CAP theorem?

CAP theorem states that a distributed system replicating data across multiple nodes cannot simultaneously guarantee Consistency, Availability, and Partition tolerance when a network partition occurs.

Theory

TL;DR

  • The trade-off activates only during partitions; in normal operation, target all three properties
  • Partition tolerance is non-negotiable in real networks, so the real choice is CP or AP
  • CP (ZooKeeper, CockroachDB): blocks or rejects writes during partition to stay consistent
  • AP (Cassandra, DynamoDB): always responds, serves stale local data during partition
  • CA is a single-node illusion, not a real distributed option

Quick Example

javascript
class CAPDemo { constructor() { this.nodes = [{ balance: 100 }, { balance: 100 }]; } // CP: rejects write if replicas have diverged (partition simulation) cpWrite(amount) { if (this.nodes[0].balance !== this.nodes[1].balance) { throw new Error('Partition detected: rejecting write to stay consistent'); } this.nodes.forEach(n => (n.balance += amount)); return this.nodes[0].balance; } // AP: writes to local node, responds regardless of replica state apWrite(amount, nodeId) { this.nodes[nodeId].balance += amount; return this.nodes[nodeId].balance; } } const demo = new CAPDemo(); console.log(demo.cpWrite(50)); // 150 - both nodes updated demo.nodes[1].balance = 200; // simulate partition: node 1 drifts console.log(demo.apWrite(10, 0)); // 160 - node 0 responds, but inconsistent with node 1

CP refuses to write when replicas disagree. AP writes locally and returns fast. Both are correct behaviors for their respective trade-offs.

CP vs AP in Practice

Network partitions are not edge cases. AWS has documented outages, switches drop packets, cross-datacenter links fail. Since partitions happen, you take Partition tolerance as a given and choose what to give up.

CP systems block or reject operations during a partition rather than risk inconsistency. ZooKeeper stops accepting writes if it loses quorum. CockroachDB uses Raft: no quorum, no commit. This is exactly what distributed locks and financial ledgers need.

AP systems keep responding. Cassandra accepts writes to the local node and propagates them asynchronously. Netflix runs Cassandra for over 100 PB of user event data. A stale recommendation is a minor UX issue. A service timeout is user-facing downtime.

When to Use

  • Bank transfers, account balances: CP. A rejected write beats a double-spend.
  • Leader election, distributed locks: CP. Kafka uses ZooKeeper, Kubernetes uses etcd, both built on Raft quorums.
  • Social feeds, timelines, notifications: AP. Posts appearing two seconds late is acceptable.
  • Analytics dashboards, event tracking: AP. Approximate freshness beats downtime.
  • Shopping carts with conflict resolution: AP with CRDTs or last-write-wins, which is how Amazon's original Dynamo paper designed it.

Comparison Table

CPAPCA
ConsistencyYes, waits for quorumEventual, stale reads possibleYes, but assumes no partition
AvailabilityRejects during partitionAlways responds locallyYes, single-node assumption
Partition toleranceYesYesNo
Use caseLocks, ledgers, configsFeeds, caches, high-volume eventsSingle-node dev/test only
ExamplesZooKeeper, CockroachDB, etcdCassandra, DynamoDB (default)Unpartitioned single Postgres

How It Works Internally

CP protocols (Raft, Paxos) require a majority quorum before committing any write. The leader sends a proposal to followers, waits for acknowledgment from a majority, then commits. If a partition isolates nodes below majority count, those nodes stop accepting writes. The math: N=3 nodes, quorum=2. One node isolated means two nodes can still commit. Two nodes isolated means the group halts.

AP protocols use gossip-based replication. Each node accepts writes locally and shares state with peers periodically. Version tracking via vector clocks or timestamps handles conflicts after the partition heals. Cassandra's consistency level is tunable: CONSISTENCY QUORUM with RF=3, R=2, W=2 gives you W+R > N, a CP-like guarantee while keeping AP flexibility for lower-priority reads.

I've watched teams deploy Cassandra as the default "scalable database" without thinking through the CAP choice, then spend sprint after sprint debugging stale data in payment flows. Choosing AP is not wrong. Choosing it by accident is.

Common Mistakes

Mistake: Treating CAP as a constant constraint

CAP forces a trade-off only during a partition. Outside of that, run with consistency and availability both. The theorem does not require you to sacrifice consistency on every read.

javascript
// Wrong: always block until all replicas confirm await Promise.all(nodes.map(n => n.write(value))); // one slow node kills latency // Better: quorum write - majority is enough const quorum = Math.ceil(nodes.length / 2) + 1; await Promise.all(nodes.slice(0, quorum).map(n => n.write(value)));

Mistake: Selecting CA as a real system design option

CA is theoretical. Real networks partition. MongoDB was labeled CA for years while users observed partition-triggered availability failures. Every production distributed database is CP or AP.

Mistake: Assuming AP requires no conflict handling

AP does not mean "write anywhere, read anywhere, done." Two nodes accepting conflicting writes during a partition produce diverged state. Without last-write-wins, CRDTs, or application-level merge logic, you get data loss or double-spend bugs in fintech.

Mistake: Equating CAP consistency with ACID consistency

CAP's C is linearizability: reads always reflect the latest committed write, as if the system were a single machine. ACID's C means transaction constraints remain valid after a commit. These are different properties. A system can be CAP-consistent without full ACID serializability.

Mistake: Scaling replicas without deciding on CP or AP first

More replicas without a deliberate choice defaults to whatever the database ships with. Cassandra ships as AP. MongoDB replica sets default to CP on primary reads. Know which one you need before you scale.

Real-World Usage

  • Cassandra (AP): Netflix event pipeline, 100+ PB. CONSISTENCY QUORUM for sensitive reads, ONE for high-volume writes
  • ZooKeeper (CP): Kafka broker coordination, distributed lock service
  • CockroachDB (CP): serializable isolation for banking clients
  • DynamoDB (AP default): AWS e-commerce. ConsistentRead=true shifts to CP-like behavior at a latency cost
  • etcd (CP): Kubernetes cluster state, Raft-based quorum
  • MongoDB replica sets (CP primary): Uber trip history; secondary reads trade freshness for throughput

Follow-up Questions

Q: How does PACELC extend CAP?
A: PACELC adds the latency dimension. Even with no partition (E = else), there is a trade-off between latency (L) and consistency (C). DynamoDB optimizes for L during normal operation, not just during failures.

Q: What is the difference between linearizability and eventual consistency?
A: Linearizability (CAP's C) means any read returns the result of the latest write, globally ordered. Eventual consistency means all replicas converge over time, but reads during propagation may return stale data. The gap between them is the AP/CP decision.

Q: How do W, R, and N relate to CAP in quorum systems?
A: With N replicas, setting W+R > N guarantees any read set overlaps with any write set, which gives linearizable reads. Cassandra with RF=3, W=2, R=2: W+R=4 > N=3. Setting W=1, R=1 is pure AP: fast, but stale reads are possible.

Q: How would you design a system that needs both 99.999% availability and consistent money transfers?
A: Use AP as the base layer with async multi-region replication for availability. For the transfer code path only, apply a synchronous quorum write (CP subset). The rest of the system stays available while consistency is enforced exactly where data loss is unacceptable. The Calvin protocol and dependent transactions with prepare-commit phases are production patterns for this.

Q: How do you test CAP properties in CI?
A: Jepsen is the standard tool. It injects network partitions and asserts consistency properties on the final state. It has found real bugs in etcd, Riak, and CockroachDB. For lighter integration testing, Toxiproxy simulates partition and latency between services.

Examples

Basic: CP vs AP during a simulated partition

javascript
class ReplicaSet { constructor() { this.nodes = [ { id: 0, value: 0, version: 0 }, { id: 1, value: 0, version: 0 }, { id: 2, value: 0, version: 0 }, ]; this.partition = false; } cpWrite(value) { if (this.partition) { throw new Error('CP: partition active, write rejected'); } this.nodes.forEach(n => { n.value = value; n.version++; }); return value; } apWrite(value, nodeId = 0) { const node = this.nodes[nodeId]; node.value = value; node.version++; // propagate to others only when partition is healed if (!this.partition) { this.nodes.filter(n => n.id !== nodeId).forEach(n => { n.value = value; n.version = node.version; }); } return { value, nodeId }; } read(nodeId = 0) { return this.nodes[nodeId].value; } } const rs = new ReplicaSet(); rs.cpWrite(100); // all nodes: 100 rs.partition = true; try { rs.cpWrite(200); // throws } catch (e) { console.log(e.message); // CP: partition active, write rejected } rs.apWrite(200, 0); console.log(rs.read(0)); // 200 console.log(rs.read(1)); // 100 - stale, but system stayed available

After the partition heals, AP systems need a reconciliation step. Cassandra uses last-write-wins by timestamp. More complex cases use CRDTs for conflict-free merges.

Intermediate: Tunable quorum (Cassandra-style)

javascript
class TunableStore { constructor(rf = 3) { this.replicas = new Array(rf).fill(null).map((_, i) => ({ id: i, data: {}, online: true, })); } async write(key, value, w = 2) { const available = this.replicas.filter(r => r.online); available.forEach(r => { r.data[key] = { value, ts: Date.now() }; }); if (available.length < w) { throw new Error(`Write quorum not met: ${available.length} of ${w} required`); } return value; } async read(key, r = 2) { const available = this.replicas.filter(n => n.online && n.data[key]); if (available.length < r) { throw new Error(`Read quorum not met: ${available.length} of ${r} required`); } return available .map(n => n.data[key]) .sort((a, b) => b.ts - a.ts)[0].value; } } const store = new TunableStore(3); await store.write('balance', 500, 2); store.replicas[2].online = false; // one node goes down console.log(await store.read('balance', 2)); // 500 - quorum of 2 still works store.replicas[1].online = false; // second node goes down try { await store.read('balance', 2); } catch (e) { console.log(e.message); // Read quorum not met: 1 of 2 required }

This is the W+R > N guarantee in action. RF=3, W=2, R=2: any read set overlaps with any write set. Drop to W=1, R=1 and you get pure AP with stale read risk. Choose based on what is worse for your use case: a failed read or a stale one.

Advanced: Partition handling with compensation (fintech pattern)

javascript
// Pattern: AP base with CP enforcement for critical operations only class HybridStore { constructor() { this.regions = { us: { balance: 1000, version: 1, online: true }, eu: { balance: 1000, version: 1, online: true }, }; this.pendingTransfers = []; } // AP read: local region, fast, may be slightly behind read(region) { return this.regions[region]; } // CP write for transfers: require all regions, reject if any unreachable transfer(fromRegion, amount) { const allOnline = Object.values(this.regions).every(r => r.online); if (!allOnline) { this.pendingTransfers.push({ fromRegion, amount, ts: Date.now() }); throw new Error('Partition active: transfer queued for post-heal commit'); } if (this.regions[fromRegion].balance < amount) { throw new Error('Insufficient funds'); } Object.values(this.regions).forEach(r => { r.balance -= amount; r.version++; }); return this.regions[fromRegion].balance; } // Heal partition and replay pending transfers heal() { Object.values(this.regions).forEach(r => (r.online = true)); const pending = [...this.pendingTransfers]; this.pendingTransfers = []; return pending.map(t => { try { return { ...t, result: this.transfer(t.fromRegion, t.amount) }; } catch (e) { return { ...t, error: e.message }; } }); } } const store = new HybridStore(); store.regions.eu.online = false; // EU region partitioned try { store.transfer('us', 200); } catch (e) { console.log(e.message); // Partition active: transfer queued for post-heal commit } console.log(store.read('us').balance); // 1000 - AP read still works const resolved = store.heal(); console.log(resolved); // transfer replayed after partition healed console.log(store.read('us').balance); // 800

This is the pattern behind multi-region banking: AP for reads everywhere, CP enforcement only on the transfer code path. The pending queue with post-heal replay keeps the system available without risking financial data loss during a partition.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Finished reading?