Deep Dive into CAP Theorem

Introduction

Eric Brewer proposed the famous CAP theorem: it is impossible for a web service to simultaneously maintain all three of the following properties: Consistency, Availability, and Partition tolerance. The paper "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services" provides a proof for this theorem.

As the theory spread, it seemingly morphed into a dogmatic conclusion, deviating from Eric Brewer's original intent. Later, he authored the article "CAP Twelve Years Later: How the 'Rules' Have Changed," which clarifies the misconceptions, applications, and handling of partition related to the CAP theorem.

I read these two papers earlier and gained significant insights. Recently, with continuous learning and practice, I found that these papers resolved many of my questions (some old, some new). Therefore, I wanted to promptly document these reflections. This article will primarily cover the proof of the CAP theorem, CP vs. AP, and the process for handling network partitions.

Proof of the CAP Theorem

Definitions

First, let's look at the definitions of the three properties in CAP:

  • C, Consistency: Writes to a single value exhibit linearizability. This means that read and write operations by clients on a set of nodes appear as if they are operating on a single logical node. It also implies that "read-after-write" semantics are always correct.
  • A, Availability: Every request receives a response within a finite time.
  • P, Partition Tolerance: The system can handle network partitions. In other words, the system can tolerate the loss of any number of messages for any duration.

Proof

The proof of this theorem is straightforward using proof by contradiction:

Assume a system possesses all three CAP properties and has two nodes, S1 and S2. When a network partition occurs between the two nodes, they cannot communicate at all. At this point, a client first writes to S1 and then reads from S2. The value it reads cannot be the one it just wrote because S1 cannot synchronize the data to S2. At this moment, the system no longer maintains the Consistency property, leading to a contradiction. QED.

Application

After learning about Raft, I had this question for a long time: Raft is strongly consistent, highly available, and partition-tolerant. Doesn't it seem to possess all three CAP properties simultaneously? Because when a network partition occurs, as long as a majority of nodes are in one partition, the system can continue operating normally. Furthermore, Raft, as a consensus algorithm, remains strongly consistent regardless of network partitions.

However, revisiting the proof process, it's clear that Raft is CP. In a two-node system experiencing a network partition, Raft will reject client requests to avoid breaking strong consistency.

CP vs. AP

Things are constantly changing. A mistaken view is that a system can only ever possess one of the three properties: CP, AP, or CA. When there is no network partition, the vast majority of systems are CA. Only when a network partition occurs is there a trade-off between C and A.

Things are not black and white. Trading off between C and A does not mean completely choosing one and abandoning the other. What is considered available? A client receiving a response within a "reasonable time", which is clearly a human-defined threshold. Simultaneously, there are various levels of consistency to choose from. We can fully maximize both properties where they don't conflict, just to different degrees.

The true purpose of the CAP theorem is to inform people to rationally choose the level of consistency and availability based on their business scenarios. Brewer realized this in the late 1990s. But at that time, most people only accepted the ACID transaction model and were unwilling to sacrifice consistency for availability. Brewer first proposed the BASE theory, but it received little attention. Subsequently, he proposed the CAP theorem, which sparked widespread discussion, leading many to mistakenly believe that consistency and availability are mutually exclusive. Twelve years after the birth of the CAP theorem, Brewer wrote an article combining industry developments to express his original views. Therefore, the focus of that article was on AP.

This background story carries a flavor of "people prefer compromises" and "achieving goals by indirect means". It's evident that many programmers are unaware of industry trends, focusing only on their immediate work and relying on hearsay during their spare time. This isn't meant as criticism, but a lack of continuous, systematic learning habits can hinder personal technical progress. Although information access today is far superior to back then, this story still makes me wary because many articles in the domestic Chinese internet are still about CP databases, with little discussion of AP databases and the C/A trade-off, which represents the trend of industry development.

The Process for Handling Network Partitions

Brewer outlines a general three-step process for handling network partitions:

  1. Detect the network partition.
  2. Enter partition mode, restricting certain operations.
  3. Recover from the network partition, restoring consistency and compensating for mistakes.

Detecting Partitions

The definition of a network partition can be understood from a graph theory perspective: When there is no partition, the entire system is connected. When a partition occurs, the system splits into multiple connected components. Nodes within each component can communicate, but communication between components is impossible.

However, this definition isn't suitable for detecting partitions in practice because each node communicates with others over the network and cannot know the true state of the entire system. We typically use artificially-set timeout periods to judge node reachability and thus detect network partitions. When a node's request times out, it assumes a network partition has occurred and (explicitly) enters partition mode.

Partition Mode

In partition mode, the system must trade off between C and A, specifically by restricting certain operations. Which operations are restricted depends on the invariants the system must maintain. Invariants can be understood as constraints, such as the uniqueness of an ID or the non-negativity of a balance. Therefore, for each invariant, the system needs to decide:

  • Choosing Consistency means ensuring the invariant is not violated. This requires restricting all operations that could potentially break the invariant, but this significantly degrades the user experience.
  • Choosing Availability means prioritizing the user experience, continuing to operate even if it might break the invariant. The difficulty here lies in the subsequent recovery work. The system also needs to record the operations performed for use during the recovery phase.

Brewer points out that an advantage of a CP system is that we don't need to consider invariants as much, reducing the design burden. Designing an AP system requires constantly considering how to maintain all invariants, which is error-prone.

Recovering from a Network Partition

When communication between nodes is restored, the network partition ends. The system exits partition mode and needs to explicitly initiate the recovery process.

The first step is to restore data consistency because data differs across partitions. The common approach is to merge the conflicting operation histories from different partitions based on business logic. Some conflicts might be unmergeable or extremely difficult to resolve comprehensively. In such cases, either manual intervention is required, or those operations should have been disallowed during partition mode (choosing C over A from the start).

Simultaneously, system invariants must be restored, and mistakes compensated. Taking airline overbooking as an example, restoring the invariant means ensuring the ticket number is non-negative. Based on the operation history, a first-come-first-served strategy could be used, canceling subsequently overbooked orders. Compensating affected customers involves the company's business departments.

Clearly, to achieve availability, the operations allowed during partition mode are strongly related to the recovery process. This involves not only coding but also the company's business departments and user experience.

Brewer's suggestion is to defer operations that break invariants during partition mode to avoid later compensation. Upon recovery, execute these deferred operations uniformly based on the operation history. Using the airline ticket example again, if during a network partition, the system shows "Processing your request" instead of "Purchase Successful," then when the partition recovers, the system can uniformly determine which users successfully purchased tickets, thus avoiding overbooking.

This approach essentially chooses C over A, but we handle the network partition in a more user-friendly way. However, not all systems are suited to display a "processing" state; it requires rational design based on specific business scenarios.

Example: An AP ATM

Typically, for services involving money, we tend to prefer CP systems, mainly due to compliance complexities. However, AP often means a better user experience and, (in the context of free markets) can lead to greater market share and more revenue. Brewer gives an interesting example of an AP ATM.

The operations supported by an ATM are withdrawal, deposit, and check balance. The invariant is a non-negative balance. During a network partition, only withdrawals can potentially break this invariant. The ATM sets a withdrawal limit: withdrawals below this limit are allowed; those exceeding it are rejected. It follows that a higher limit means higher risk but better user experience. Obviously, the algorithm for determining this limit is complex and involves many factors.

When recovering from the network partition, it's easy to identify which users have negative balances based on the operation history. The compensation measure in this case could be legal actions to recover the funds.

Writing this, I suddenly recall two instances where WeChat Pay covered payments for me. Once was when bank system maintenance prevented WeChat Pay from deducting funds from my bank card, and another was due to a network failure causing a payment to fail. In both cases, WeChat notified me of the successful payment afterward through different means. Clearly, this is an AP design and could also be seen as a form of financial innovation?

Postscript

This is already the second blog post about a paper, but I still find writing them quite challenging. The reasons might be:

  • Papers are rigorous, and their content cannot be expressed casually.
  • I need to convey the ideas from the article in my own words.
  • English and Chinese have different modes of expression, so direct translation doesn't work.

I plan to document every interesting paper I read, so there will be more blog posts. I hope that writing more will hone my writing skills and make the process less strenuous.

References

  • Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
  • CAP Twelve Years Later: How the “Rules” Have Changed