Unsolvable Distributed Consensus and The Two Generals' Problem

399 views Distributed Systems

Reaching consensus is extremely important in any distributed network. For example, we cannot have two data nodes in a cluster where one thinks that price is $1000 while the other thinks it is $2000.

So, we need the nodes to talk to each other and reach a consensus and converge on the true value of price. Reaching consensus is easy when there are no failures - no network failures, or process failures.

But, reaching a consensus becomes impossible when we cannot guarantee message delivery.

Two Generals’ Problem

Say, there are two generals - A and B - and they want to attack the enemy from two different directions. The only way to conquer the enemy is when both generals attack simultaneously. If only one attacks, then the enemy wins.

The generals communicate via foot soldiers. These foot soldiers can be captured by the enemy and hence the message that generals wanted to send to each other can be lost. So, how would generals coordinate the attack?

When no messages are lost?

If the communication channel is reliable, then the generals all send each other messages to agree to attack and everyone responds/ack to everyone else, thus coordinating the attack.

Real World Analogy

Committing to a distributed database. The commit should succeed when all the nodes of the database agree to commit. If anyone cannot commit then the commit cannot go through.

When messages are lost

When general A sent a message to general B, what if B’s response got lost? then general A would not know if it should attack or not.

Also, since B did not receive an ack from A, then it cannot decide if it should attack or not either.

This is where we see both generals will keep on waiting for an acknowledgment of an acknowledgment, purely because the communication channel is unreliable.

This is the class Two Generals’ Problem where it is impossible to reach a consensus when the underlying communication channel is unreliable.

How should the generals decide?

Generals, instead of sending just a foot soldier, can send multiple foot soldiers increasing the probability that at least one of them would go through.

This is like we are flooding an unreliable network to get our message delivered.

But how do we do this in the Real World?

In the real world, hence we do not assume a completely unreliable network. Instead, we assume a certain fraction of messages will be lost - eg: 1 in 2. Hence to overcompensate we send 2 messages instead of one.

Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 17000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.

Other essays that you might like

Two Phase Commit to power Distributed Transactions in a Distributed System

618 views 28 likes 2022-09-16

Distributed Transactions are the heart and soul of Distributed Systems and getting all the participating nodes to agree ...

Exponential Information Gathering (EIG) Algorithm for Byzantine Agreement

379 views 16 likes 2022-09-14

Byzantine Agreement is an important problem to address in a Distributed Network. It is all about being tolerant of the n...

Exponential Information Gathering (EIG) Algorithm - Distributed Consensus even when processes crash

245 views 6 likes 2022-09-12

Exponential Algorithms have to be the worst possible way to solve Distributed Consensus; but are they really that bad? ...

FloodSet Algorithm - Distributed Consensus even when processes crash

432 views 14 likes 2022-09-09

Reaching a consensus is extremely critical in a Distributed System as we would have situations day-in and day-out where ...

Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

800+ learners

Details →

Designing Microservices

A free playlist to help you understand Microservices and their high-level patterns in depth.

17+ learners

Details →

GitHub Outage Dissections

A free playlist to help you learn core engineering from outages that happened at GitHub.

67+ learners

Details →

Hash Table Internals

A free playlist to help you understand the internal workings and construction of Hash Tables.

25+ learners

Details →

BitTorrent Internals

A free playlist to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.

Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.

  • v12.7.8
  • © Arpit Bhayani, 2022

Powered by this tech stack.