Understanding Raft Safety: Why Committing Log Entries from Previous Terms is Forbidden

Introduction

It has been over two months since I completed the Raft Lab of MIT 6.824, and I've reread the Raft paper. Compared to my first reading, I now have a deeper understanding, especially regarding the safety implications of the "Committing entries from previous terms" discussed in section 5.4.2. In this post, I will elaborate on my understanding of this safety constraint.

The Definition of Commit

First, it's essential to clarify the definition of commit; otherwise, it's easy to confuse the subject of our discussion. The author initially describes the commit in section 5.3 (Log Replication). Let's examine it sentence by sentence.

The leader decides when it is safe to apply a log entry to the state machines; such an entry is called committed.

The first sentence defines a commit: the Leader actively commits a log entry, after which that entry can be applied to the state machine.

Raft guarantees that committed entries are durable and will eventually be executed by all of the available state machines.

The second sentence states the durability guarantee for committed log entries, which is a necessary condition.

A log entry is committed once the leader that created the entry has replicated it on a majority of the servers (e.g., entry 7 in Figure 6).

The third sentence is easily mistaken for a sufficient condition: once a log entry is replicated to a majority of servers, it is committed. In reality, this sentence elaborates on the first sentence, providing a concrete description of "when it is safe". If we treat it as a sufficient condition, we overlook the fact that the Leader is the agent of the commit action. Only when the Leader decides to commit can an entry be committed; a log entry cannot commit itself.

Understanding the decisive role of the Leader in commits allows us to begin discussing why, in section 5.4.2, the Leader is forbidden from committing log entries from previous terms.

5.4.2 Committing entries from previous terms

If a Leader were allowed to commit log entries from previous terms, the hazard is illustrated in Figure 8: the log entry with term 2, committed in (c), is overwritten in (d), violating the durability guarantee of commits.

image-20251018134457438

Why does this happen? Because a Leader has the authority to modify the log entries of Followers. Of course, a Leader cannot modify them arbitrarily; it only forces Followers to synchronize their logs with its own when inconsistencies are detected. Due to the Election Restriction in section 5.4.1, a Leader elected must have a log that is at least as up-to-date as the logs of the majority of nodes in the cluster. Therefore, a Leader synchronizing a Follower's log only occurs when the Leader has newer log entries. As shown in Figure 8, S5 in (d) possesses a log entry with term 3, while the majority of nodes in the cluster only have log entries with term 2. At this point, S5 is elected Leader and begins log synchronization. Consequently, if Leader S1 in (c) had committed the log entry with term 2, that entry would risk being overwritten, thereby violating the durability guarantee of commits. More generally, when a Leader wishes to commit a log entry with a past term $x$, it cannot guarantee that no other node possesses a log entry with term $x + 1$, nor can it prevent such a node from becoming the Leader in the next term.

Why, then, does committing a log entry with the current term not violate the durability guarantee? Because a committed log entry must exist on a majority of nodes. Due to the Election Restriction, the Leader of the next term must also possess this log entry, thus ensuring durability. Meanwhile, upon successful election, the Leader of the next term cannot possibly possess a log entry with the next term.

Therefore, Raft explicitly forbids committing log entries with previous terms. Instead, it commits log entries with the current term and relies on the Log Matching Property to indirectly commit log entries with previous terms. This is a supplement to the "when it is safe" condition mentioned earlier, which is one of the criteria the Leader uses to make commit decisions.

It's important to note that under this restriction, a log entry with a past term that has been replicated to a majority of nodes is not necessarily committed. Suppose in (c), Leader S1 goes offline before it can indirectly commit the log entry with term 2. In that case, the entry would still be overwritten in (d). This is correct behavior because uncommitted log entries do not carry a durability guarantee.

References

  • Raft: In Search of an Understandable Consensus Algorithm (Extended Version)