Trading Fish The website of Hector Castro

Raft Leader Election in Consul

A small paper reading group has assembled at work. We give ourselves two to three weeks to read a paper, meetup after hours, eat pizza, and discuss it. Our last paper focused on the Raft consensus algorithm, and I was chosen to lead the discussion.

In order to help the impact of Raft hit closer to home, I put together a small demo of Raft’s leader election process using Consul. The demo spins up a three node Consul cluster using containers, then interleaves all of the debug log output filtered with grep for raft. Reading through parts of the Raft paper, you can see how the logging output of HashiCorp’s implementation lines up.

Reading Along

Section 5.2 of the Raft paper focuses on leader election, and starts off with:

When servers start up, they begin as followers.

Sure enough, the first raft filtered logs start with:

$ docker-compose up | grep raft
consul1 | [INFO] raft: Node at 172.17.0.45:8300 [Follower] entering Follower state
consul2 | [INFO] raft: Node at 172.17.0.44:8300 [Follower] entering Follower state
consul3 | [INFO] raft: Node at 172.17.0.43:8300 [Follower] entering Follower state

Next is the the beginning of an election:

If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader.

That corresponds with:

consul1 | [WARN] raft: Heartbeat timeout reached, starting election

Now that the election started, there needs to be a winner:

A candidate wins an election if it receives votes from a majority of the servers in the full cluster for the same term.

Which goes with:

consul1 | [DEBUG] raft: Votes needed: 2
consul1 | [DEBUG] raft: Vote granted. Tally: 1
consul1 | [DEBUG] raft: Vote granted. Tally: 2
consul1 | [INFO] raft: Election won. Tally: 2
consul1 | [INFO] raft: Node at 172.17.0.45:8300 [Leader] entering Leader state

Lastly, AppendEntries is used to communicate the new leader to all other candidates:

While waiting for votes, a candidate may receive an AppendEntries RPC from another server claiming to be leader.

Logs from consul1 show that it is replicating to consul2 and consul3:

consul1 | [INFO] raft: pipelining replication to peer 172.17.0.44:8300
consul1 | [INFO] raft: pipelining replication to peer 172.17.0.43:8300