068 · CAP · CONSISTENCY · AVAILABILITY

CAP Theorem

In a partition, you must choose between consistency and availability.

If you are new here: CAP is a mental model for distributed databases, not a law you “implement” like a library. It says: when the network splits and replicas cannot coordinate, you must choose what to sacrifice — perfect agreement on reads/writes (consistency) or always answering quickly (availability). Partition tolerance means “the system still does something defined when the network misbehaves” — and in real multi-node systems, you do not get to pretend partitions never happen.

LetterStands forBeginner-friendly meaning
CConsistency (linearizable-style in CAP talks)Reads do not return stale nonsense relative to what a single system image would show — or the system errors instead of lying
AAvailabilityEvery request gets a non-error response (in the CAP proof sense) — not “try again later” because replicas could not agree
PPartition toleranceThe system keeps operating across a network split with a defined policy — not “assume the network is always perfect”

The Problem

You're building a distributed system. Your app stores data across three servers in different data centers — one in Virginia, one in Oregon, one in Dublin — because you want scale, redundancy, and low latency for users worldwide.

Now a fiber cable between Virginia and Oregon gets cut. Both data centers are up, users can still reach them, but the two sides can't talk to each other. A user in New York writes a new record to Virginia. A user in San Francisco reads from Oregon. What should Oregon return?

You can't have it all. This is the core tension in every distributed system, and CAP Theorem gives it a name.

In plain terms: CAP is the reminder that networks fail, and when they do, “always correct” and “always answers OK” can collide. Adults pick which pain they want for which data — money-shaped data usually chooses differently than a social feed.

Analogy: Two siblings share a bank ledger, but their phones lose signal. Either they stop letting anyone spend until they reconcile (CP-flavored), or they keep letting purchases happen knowing the two ledgers might disagree until signal returns (AP-flavored). There is no magic third option where the network is broken but both ledgers are instantly perfect forever.

CAP stands for three properties a distributed data store can have:

  • Consistency — every read returns the most recent write (or an error)
  • Availability — every request gets a response, no matter what
  • Partition Tolerance — the system keeps running even when nodes can't communicate

Eric Brewer proved in 2000 that you can guarantee at most two of these three simultaneously. And since network partitions are an unavoidable reality of running servers, you're really choosing between Consistency and Availability when things go wrong.

Consistency

In plain terms: after any write, every read from any node returns that value — or the system refuses to answer.

Analogy: Think of your bank balance. You transfer $500 to pay rent. Seconds later your partner opens the same account on their phone. They must see the updated balance. Showing the old balance — even for a millisecond — is wrong. The bank cannot be "eventually right" about money.

Consistent systems achieve this by making nodes agree before confirming a write. Under the hood: the write goes to Node A, Node A forwards it to B and C, waits for acknowledgement, then tells the client "done." Every node sees the same value because no write is visible until a majority agrees.

The cost: when nodes can't reach each other, they refuse to accept writes rather than risk divergence. During a partition, a CP system goes offline rather than serve wrong data.

// CP system under partition:
write("balance", 900)
→ ERROR: cannot reach quorum — try again later

Availability

In plain terms: every request gets a response. No errors, no timeouts — but the data might be a little stale.

Analogy: Think of your Twitter (now X) feed. When you open it, you always see something — even if that tweet from 3 seconds ago hasn't appeared yet. Nobody expects a social feed to be perfectly up-to-the-millisecond. Showing you content from 5 seconds ago is fine. Refusing to load at all is not.

Available systems achieve this by letting each node respond independently. No coordination required — every node serves reads from its local state and accepts writes immediately. Under a partition, both sides keep running independently and sync up when connectivity is restored.

The cost: the two sides can diverge. Node A might accept a write that Node B doesn't know about. When the partition heals, someone has to decide which write wins. Systems handle this with Last Write Wins, vector clocks, or CRDTs — but conflict resolution is never free.

// AP system under partition:
write("counter", 42)  → OK  (Node A accepts immediately)
read("counter")       → 41  (Node B returns stale — not synced yet)
// ...partition heals → both nodes converge to 42

Partition Tolerance

A network partition is when two or more nodes in your cluster can't reach each other — dropped packets, failed switches, a backhoe cutting a fiber line, or a misconfigured firewall. The nodes themselves are fine. The network between them is not.

Partition Tolerance means your system has a defined, correct behavior when this happens. It does not mean partitions are prevented — it means the system doesn't break silently when they occur.

Here's the thing: you cannot opt out of Partition Tolerance. Any system running on more than one machine will eventually experience a partition. Multi-region deployments, Kubernetes pods across availability zones, even two servers in the same rack — they all communicate over a network, and networks fail.

Designing for "no partitions" means designing a system that behaves incorrectly when the inevitable happens. This is why CAP's real choice is CP vs AP — Partition Tolerance is not on the table.

The Trade-offs

Since P is non-negotiable, every distributed database picks a side. Here's how that plays out with real systems you'll use as a cloud engineer:

CP systems — choose consistency over availability under a partition:

SystemWhy it's CP
MongoDB (majority writes)Primary-only reads by default; quorum writes block during partition
ZooKeeperBuilt for coordination — stale state would break leader election
HBaseStrongly consistent via ZooKeeper; prefers error over wrong answer

AP systems — choose availability over consistency under a partition:

SystemWhy it's AP
CassandraLeaderless; every node accepts writes; tunable consistency per-query
DynamoDBEventually consistent by default; strongly-consistent reads cost extra
CouchDBMulti-master; optimistic replication; merges conflicts on sync

The nuance: most real systems aren't binary. Cassandra lets you set QUORUM per query, temporarily behaving like a CP system. DynamoDB's strongly-consistent reads let you pay extra for correctness when you need it. You often tune the trade-off at the operation level.

Why this matters for you

When you design a system, ask one question: what's worse — a stale read, or a failed write?

For a payment ledger, a stale read means double-spending. Choose CP. For a shopping cart, a missing item or a duplicate that gets deduped at checkout is tolerable. Choose AP and scale horizontally. For a leaderboard, exact rankings don't need to be atomic. AP is fine.

CAP Theorem doesn't tell you what to build. It tells you that the choice exists, it's unavoidable, and the best engineers make it deliberately rather than discovering it at 2am during an incident.

DIAGRAMDrag nodes · pan · pinch or double-click to zoom
FRAME 1 OF 5

Normal operation: client writes to the primary database, which replicates to two replicas. All three CAP properties hold — until a network partition breaks that assumption.