Distributed Consensus in Practice

2025-11-15

Distributed Consensus in Practice

Implementing a consensus algorithm like Raft is a rite of passage for distributed systems engineers. While the paper makes it sound straightforward, the "edge cases" are where the real complexity lies. Here are some notes from my experience implementing it.

The Heartbeat and Election Cycle

At its core, Raft relies on a heartbeat mechanism. If a follower doesn't hear from a leader within a certain window, it assumes the leader has failed and starts an election.

The most critical detail here is randomized election timeouts. If all nodes had the same timeout, they would all start elections at the same time, leading to constant split votes where no one gains a majority. In practice, a range like 150ms–300ms works well for small clusters.

The Log Replication Flow

Consensus isn't just about picking a leader; it's about agreeing on a sequence of events.

  1. The Proposal: A client sends a command to the leader.
  2. Appended: The leader appends the command to its local log.
  3. Replication: The leader sends AppendEntries RPCs to all followers.
  4. Commitment: Once a majority of nodes have acknowledged the entry, the leader marks it as "committed."
  5. Application: Finally, the entry is applied to the state machine (the actual database or service).

Why "Commitment" Matters

A common pitfall is applying a log entry too early. You must only apply entries to your state machine after they are committed. This ensures that even if a leader fails, the new leader will have the most up-to-date committed entries, maintaining the "Safety" property—the guarantee that all nodes apply the same commands in the same order.

Practical Challenges: Network Partitioning

What happens if a node is partitioned from the majority? It might stay as a candidate and keep incrementing its "Term" number. When the partition heals, this node might try to force an election because it has a higher term. Modern implementations use a "Pre-Vote" phase to prevent a disruptive node from triggering unnecessary re-elections.