Replication Strategies

In a distributed system, when replication is set up between data nodes, there are typically three replication strategies - Synchronous, Asynchronous, and Semi-synchronous. Depending on the criticality of data, its consistency, and the use-case at hand, the system chooses to apply one over another. In this essay, we take a quick yet verbose look into these strategies and understand their implications.

Before we jump into the replication strategies, let's first understand the need for them. When the data is replicated, multiple copies of the same data are created and placed on multiple machines (nodes). A system replicates the data to

  • improve availability, or
  • prepare for disaster recovery, or
  • improve performance by leveraging parallel reads across replicated data, or
  • keep the data geographically close to the user, e.g., CDN

The Replication comes into action when the Client initiates the write on the Master node. Once the Master updates its copy of the data, the replication strategy dictates how the data would be replicated across the Replicas.

Synchronous Replication

In Synchronous Replication, once the Master node updates its own copy of the data, it initiates the write operation on its replicas. Once the Replicas receive the update, they apply the change on their copy of data and then send the confirmation to the Master. Once the Master receives the confirmation from all the Replicas, it responds to the client and completes the operation.

synchronous replication

If there is more than one Replica in the setup, the Master node can propagate the write sequentially or parallelly. Still, in either case, it will continue to wait until it gets a confirmation, which will continue to keep the client blocked. Thus having a large number of Replicas means a longer block for the Client, affecting its throughput.

Synchronous Replication ensures that the Replicas are always in sync and consistent with the Master; hence, this setup is fault-tolerant by default. Even if the Master crashes, the entire data is still available on the Replicas, so the system can easily promote any one of the Replicas as the new Master and continue to function as usual.

A major disadvantage of this strategy is that the Client and the Master can remain blocked if a Replica becomes non-responsive due to a crash or network partition. Due to the strong consistency check, the Master will continue to block all the writes until the affected Replica becomes available again, thus bringing the entire system to a halt.

Asynchronous Replication

In Asynchronous Replication, once the Master node updates its own copy of the data, it immediately completes the operation by responding to the Client. It does not wait for the changes to be propagated to the Replicas, thus minimizing the block for the Client and maximizing the throughput.

The Master, after responding to the client, asynchronously propagates the changes to the Replicas, allowing them to catch up eventually. This replication strategy is most common and is the default configuration of most distributed data stores out there.

asynchronous replication

The key advantage of using a fully Asynchronous Replication is that the client will be blocked only for the duration that the write happens on the Master, post which the Client can continue to function as before, thus elevating the system's throughput.

One major disadvantage of having a fully Asynchronous Replication is the possibility of data loss. What if the write happened on the Master node, and it crashed before the changes could propagate to any of the Replicas. The changes in data that are not propagated are lost permanently, defeating durability. Although Durability is the most important property of any data store,

Asynchronous Replication is the default strategy for most data stores because it maximizes the throughput. The third type of replication strategy addresses durability without severely affecting throughput, and it is called Semi-synchronous Replication.

Semi-synchronous Replication

In Semi-synchronous Replication, which sits right between the Synchronous and Asynchronous Replication strategies, once the Master node updates its own copy of the data, it synchronously replicates the data to a subset of Replicas and asynchronously to others.

semi-synchronous replication

The Semi-synchronous Replication thus addresses the durability of data, in case of Master crash, at the cost of degrading the Client's throughput by a marginal factor.

Most of the distributed data stores available have configurable replication strategies. Depending on the problem at hand and the criticality of the data, we can choose one over the other.

Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 17000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.

Other essays that you might like

Multi-Master Replication

497 reads 2021-11-03

In this essay, we look at what Multi-Master Replication is, the core challenge it addresses, use-cases we can find this ...

Handling outages in a Master-Replica setup

375 reads 2021-09-07

This essay talks about the worse - nodes going down - impact, recovery, and real-world practices....

The New Replica

293 reads 2021-08-23

In this one, we take a look into how these Replicas are set up and understand some quirky nuances about Replication....

Replication Formats

334 reads 2021-08-15

When we are employing a Master-Replica pattern to improve availability, throughput, and fault-tolerance, the big questio...

Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

800+ learners

Details →

Designing Microservices

A free playlist to help you understand Microservices and their high-level patterns in depth.

17+ learners

Details →

GitHub Outage Dissections

A free playlist to help you learn core engineering from outages that happened at GitHub.

67+ learners

Details →

Hash Table Internals

A free playlist to help you understand the internal workings and construction of Hash Tables.

25+ learners

Details →

BitTorrent Internals

A free playlist to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.

Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.

  • v12.7.8
  • © Arpit Bhayani, 2022

Powered by this tech stack.