How JunoDB is designed to achieve six 9's of availability

Watch the video explanation ➔

Write an engineering article with a concrete timeline in Markdown format from the following transcript.

so paper recently opened Source their key value database named junodb and I spent a few days going through it to understand its features and guarantees in the series of videos I will be going through the database and talking about the key details and design decisions they took while building this in the process we will understand how a production grade key value store is spent this is the fourth video of the series and in this one I will be talking about how junodb uses data redundancy to achieve six nines of availability it will be pretty interesting Deep dive onto designing highly available distributed data stores so let's jump right into it so becoming a better engineer is the need of the I have something that you will find amusing eye conduct s No Nonsense approach these these courses are designed to help you become a better engineer and Ace your career the courses will definitely regulate your love for engineering and Spark the much needed engineering curiosity some of my most popular courses are of system design and database internals because I operated the no fluff approach my courses are enrolled by folks across all levels from SD ones to tech leads to staff to EMS to VP engineering of some of the most prominent companies out there all the details about the courses like curriculum prerequisites testimonials epic use can be found on the course pages and I have linked them in the description down below so do check them out and I cannot wait to see you all become better engineers and as your engineering career now let's go back to the video now maintaining High availability is really essential for a payments platform like PayPal because it because it deals with financial data it deals with real-time Financial transactions if it goes down it could lead to a business Havoc a loss of Revenue you cannot have that poor user experience and whatnot because it deals with money so Juno DB needs to withstand unforeseen outages like software bugs Hardware failures power outages natural calamities and all because failures are costly for a financial platform like you worst that could happen is data loss it needs to ensure that there is no data loss no matter what slow response number kind of okay because of retries and all but you still have a degraded performance you don't want that either so a highly available system is a must for a company like Payphone so to ensure High availability understand this understand this rule of thumb to ensure High availability you only need to ensure redundancy simple if you are ensuring redundancy you are building a highly available system so redundancy with respect to database can be what storage and compute so let's go through the architecture and see what all is redundant enough so load balancer is implicitly redundant it can scale up scale down if one node goes down within the load balancer another one takes its place that's what load balancer guarantees so that's taken care of Juno proxy in the previous video we saw how it was stateless and equal so there is no problem even if one of them goes down there are others which are equally capable of handling the request and then you have storage this is where things become interesting is our storage redundant not yet but we would make a redundant to make it highly available so let's jump into it what about storage now the first thing to make storage redundant like to sorry to make data redundant what we do is we or other Juno DB it's a very beautiful way to visualize it fii like this is the first time I saw it it's very beautiful so what Juno DB does is uh it arranges logically it arranges the storage nodes in a grid in a grid this is the grid that I'm talking about and again these are the stories and these are not independent Juno DB clusters this is storage nodes and these are shards okay so the rows of this grid are called storage groups these are storage groups so these four storage servers are part of a storage group called sg1 and the column is a Zone right now I'll talk about zones so these are part of a single zone these four are these three are part of single zone these three are part of zone three these three are part of zone four now what is a Zone Zone in terms of cloud it could be availability Zone in terms of bare metal infrastructure it could be different so it could be different racks that you have in your data center right so think of Zone like that and row is a storage Group which is a logical grouping of the storage servers right now this red and yellow will will focus on this SG3 the storage group 3. this red and yellow these are shards that we talk about one zero two for fixed number you can change it FYI but this shards these are the shards that are owned by this storage servers right you may have like in this grid I have four zones and three storage group so the your data the key thing is this very this is where your data redundancy part comes in your shards your red shirt is replicated synchronously to all the four zones similarly your yellow Shard is replicated to all the four storage servers on all four zones over there because they're part of the same storage group so these four storage servers owns these two shards having exact same copy of data exact same right okay now the keyword over here is synchronously so when the rights and the reads go to they go to a storage group on all of those storage servers part of the storage group the rights and the reads go synchronously and you get a quorum the read and the right Quorum we'll talk about that in detail so now let's take a concrete example to understand how reads and rights work this is very important when you are building a distributed data store that needs to be distributed and highly available you will find it across all distributed data stores out there so how reads and writes work write for a particular let's say write for a particular key K1 comes in what do we do first write for a particular key K1 comes in that key K1 pass through the murmur hash if we compute the murmur hash in the previous one we saw that from that murmur hash we mod it by the number of shards that we have we know which Shard this key belongs to the number of shards I picked so we do directly mod of it we know which chart it belongs to from this Shard we figure out which storage server group this chart belongs to right from this chart we know because we know this chart belongs to which storage group we know that now once we know which storage group it belongs to when we do either read or write of it it goes to all the nodes of the storage group so when my Juno proxy issues the right request instead of issuing the right request to one storage server IT issues the right request to all four storage servers or which belong to the same storage group let's say my chart belongs to storage group 3 the right request will go to all four storage servers let's say A B C D are my storage servers it goes to all four of them actually it will be an odd number I'm just using 4 because of lack of space over here but in reality it is it is five actually for PayPal production system so it will go to all five of them and your Juno proxy will wait for acknowledgment from Maximum of them let's say three of them right so it's rating for right quorum to happen so when it gets acknowledgment from majority of the loads then it acknowledges the right back to the calendar Day write is successful similarly when the read happens the read comes from Juno proxy 0 proxy knows that hey this storage group is what will be owning this the reads will go to all of them and now what would happen it would wait for the response from majority of them once it receives the response from majority of them it would pick the latest value out of that and respond back to the client this is how it would ensure that it is always written it is always storing data in a redundant fashion across them and even if a node goes down there is no data loss because other nodes are capable of handling it in case of reads because it is waiting for the response from the majority it ensures that it would always return the value which is latest one because that would be an overlap between your right majority and your read majority if you are going more than 50 percent which means if I have five zones which means on five storage servers if I'm writing the data if I wait for writing on at least three of them and if I'm waiting and if I'm waiting to read from at least three of them there will be at least one server on which I will have the latest copy of data which is red for my read and write would ensure that it is distributed among them right so this ensures we handle failovers this is beautiful this is such an elegant way to visualize it this grid is such an elegant way to visualize your redundancy part that this is how your data is making redundant across multiple storage servers and how you are guaranteeing your read read Quorum and write quorum such a beautiful implementation I have such a beautiful visualization of it right but you would think my God you have to do it you are writing to five instances in one shot one didn't be slow see if availability is important you have to do it right there is no way out because now what would happen is even if one of the node goes down you still have other nodes to solve the data and with still that same high level of consistency read your own rights and everything all guarantees over there you have to do it if you want High availability because if you're writing at just one place what if that note goes down then the rights are lost right you cannot have that you PayPal cannot afford that right so that's why you have to do it and not the best part of this the best part of this is you can another PayPal can actually take down an entire zone for regular maintenance purposes without hampering the system at all because the storage node operating system needs to be updated deployment needs to happen routine health checks of infrastructure needs to happen and whatnot it can seamlessly do that without hampering any live production traffic such a beautiful such a beautiful implementation of it right okay this was intra cluster uh intra cluster replication application now because it's a payments platform what if an entire data center is chopped off what if there is a natural Calamity in our data in the region where the data center is located PayPal cannot say we are down because of an issue right because you cannot have data losses and all so what PayPal does is it synchronizes or it asynchronously replicates the operations and the data across a similar cluster present in different data center of the same region so for example if they have data center in Singapore they would have two data centers in Singapore so across data center they are synchronously replicate the data and the operation who does that the Juno proxy does that so let's say I have in data center one cluster 1 in data center 2 cluster 2 your clients are talking to Data Center one because why not assume that they are doing it they have your Juno proxies you have your storage servers you have your load balancer over here right regular flow happens reads and writes happens intra cluster redundancy inter cluster quorums and all are maintained over here right but now when your right is acknowledged by multiple of them Juno proxy knows that the right is successful it asynchronously replicates it through over the network to another data Center's Juno proxy so now what happens is there is an earth synchronous replication setup between these two Juno proxies this way your rights are eventually sent to another Juno proxy instance in another data center so if one data center goes down due to any reason you have another data center to serve your data from this is how critical availability is for payments platform like PayPal synchronous replication or synchronous application we saw both of them synchronous replication with read columns write columns importance of it grid based layout for a very beautiful visualization of it now you can see high availability of your storage there how to do that you can see multiple data stores doing the exact same thing this Ctrl C control V concept Remains the Same read Quorum availability and all of that right but now you see how you build a very highly available distributed data store so yeah this is what I wanted to cover in this one this was the fourth beat on this series I hope you found it interesting hope you found it amusing that's it for this one I'll see in the next one thanks Adam [Music]

Here's the video ⤵

Courses I teach

Alongside my daily work, I also teach some highly practical courses, with a no-fluff no-nonsense approach, that are designed to spark engineering curiosity and help you ace your career.


System Design Masterclass

A no-fluff masterclass that helps SDE-2, SDE-3, and above form the right intuition to design and implement highly scalable, fault-tolerant, extensible, and available systems.


Details →

System Design for Beginners

An in-depth and self-paced course for absolute beginners to become great at designing and implementing scalable, available, and extensible systems.


Details →

Redis Internals

A self-paced and hands-on course covering Redis internals - data structures, algorithms, and some core features by re-implementing them in Go.


Details →


Writings and Learnings

Knowledge Base

Bookshelf

Papershelf


Arpit's Newsletter read by 100,000 engineers

Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.