September 2021 cohort is finalized with 45 amazing folks. For future cohorts Join the waitlist

Replication Strategies

Published on 10th Aug 2021

3 min read

Share this article on

In a distributed system, when replication is set up between data nodes, there are typically three replication strategies - Synchronous, Asynchronous, and Semi-synchronous. Depending on the criticality of data, its consistency, and the use-case at hand, the system chooses to apply one over another. In this essay, we take a quick yet verbose look into these strategies and understand their implications.

Before we jump into the replication strategies, let's first understand the need for them. When the data is replicated, multiple copies of the same data are created and placed on multiple machines (nodes). A system replicates the data to

  • improve availability, or
  • prepare for disaster recovery, or
  • improve performance by leveraging parallel reads across replicated data, or
  • keep the data geographically close to the user, e.g., CDN

The Replication comes into action when the Client initiates the write on the Master node. Once the Master updates its copy of the data, the replication strategy dictates how the data would be replicated across the Replicas.

Synchronous Replication

In Synchronous Replication, once the Master node updates its own copy of the data, it initiates the write operation on its replicas. Once the Replicas receive the update, they apply the change on their copy of data and then send the confirmation to the Master. Once the Master receives the confirmation from all the Replicas, it responds to the client and completes the operation.

synchronous replication

If there is more than one Replica in the setup, the Master node can propagate the write sequentially or parallelly. Still, in either case, it will continue to wait until it gets a confirmation, which will continue to keep the client blocked. Thus having a large number of Replicas means a longer block for the Client, affecting its throughput.

Synchronous Replication ensures that the Replicas are always in sync and consistent with the Master; hence, this setup is fault-tolerant by default. Even if the Master crashes, the entire data is still available on the Replicas, so the system can easily promote any one of the Replicas as the new Master and continue to function as usual.

A major disadvantage of this strategy is that the Client and the Master can remain blocked if a Replica becomes non-responsive due to a crash or network partition. Due to the strong consistency check, the Master will continue to block all the writes until the affected Replica becomes available again, thus bringing the entire system to a halt.

Asynchronous Replication

In Asynchronous Replication, once the Master node updates its own copy of the data, it immediately completes the operation by responding to the Client. It does not wait for the changes to be propagated to the Replicas, thus minimizing the block for the Client and maximizing the throughput.

The Master, after responding to the client, asynchronously propagates the changes to the Replicas, allowing them to catch up eventually. This replication strategy is most common and is the default configuration of most distributed data stores out there.

asynchronous replication

The key advantage of using a fully Asynchronous Replication is that the client will be blocked only for the duration that the write happens on the Master, post which the Client can continue to function as before, thus elevating the system's throughput.

One major disadvantage of having a fully Asynchronous Replication is the possibility of data loss. What if the write happened on the Master node, and it crashed before the changes could propagate to any of the Replicas. The changes in data that are not propagated are lost permanently, defeating durability. Although Durability is the most important property of any data store,

Asynchronous Replication is the default strategy for most data stores because it maximizes the throughput. The third type of replication strategy addresses durability without severely affecting throughput, and it is called Semi-synchronous Replication.

Semi-synchronous Replication

In Semi-synchronous Replication, which sits right between the Synchronous and Asynchronous Replication strategies, once the Master node updates its own copy of the data, it synchronously replicates the data to a subset of Replicas and asynchronously to others.

semi-synchronous replication

The Semi-synchronous Replication thus addresses the durability of data, in case of Master crash, at the cost of degrading the Client's throughput by a marginal factor.

Most of the distributed data stores available have configurable replication strategies. Depending on the problem at hand and the criticality of the data, we can choose one over the other.

If my work adds value, consider supporting me

Buy Me A Coffee

Arpit's Newsletter

2000+ Subscribers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter 👇

Other articles that you might like

Handling outages in a Master-Replica setup

Handling outages in a Master-Replica setup

This essay talks about the worse - nodes going down - impact, recovery, and real-world practices....

7th Sep
The New Replica

The New Replica

In this one, we take a look into how these Replicas are set up and understand some quirky nuances ab...

23rd Aug
Replication Formats

Replication Formats

When we are employing a Master-Replica pattern to improve availability, throughput, and fault-tolera...

15th Aug
The RUM Conjecture

The RUM Conjecture

While designing any storage system the three main aspects we optimize for are Reads, Updates, and au...

31st May