How JunoDB is designed to scale horizontally

Watch the video explanation ➔

Write an engineering article with a concrete timeline in Markdown format from the following transcript.

so paper recently open source their key value database name Juno DB and I spent a few days going through it to understand its features and guarantees in a series of videos I will be going through the database talk about the key details and design decisions they took while building this in the process we will understand how a production grade key value store is built this is the third video of the series and in this one I will be talking about how the database scales its compute storage and more interestingly the storage layer making it truly horizontally scalable so becoming a better engineer is the need of the art and to help you all reach the next level I have something that you will find amusing I conduct super practical courses with a no-nonsense approach these courses are designed to help you become a better engineer and Ace your career the courses will definitely reorganite your love for engineering and Spark the much needed engineering curiosity some of my most popular courses are on system design and database internals because I operated the no fluff approach my courses are enrolled by folks across all levels from SD ones to tech leads to staff to EMS to VP engineering of some of the most prominent companies out there all the details about the courses like curriculum prerequisites testimonials epic use can be found on the course pages and I have linked them in the description down below so do check them out and I cannot wait to see you all become better engineers and as your engineering career now let's go back to the video so let's ask this question why junodb needed to scale because instead of just making things infinitely scalable you need to understand if there is even a need to scale and if yes what exactly needs to be scared so junodb is a very core component for PayPal it Powers most critical core backend services that PayPal has now as the number of micro Services increases the number of connections coming to this Juno DB would increase so the first pain point that they need to address is the number of inbound connections they need to handle more it's not about load is still there but more importantly the number of inbound connections they need to handle this part more because as soon as you add one more micro Services sorry as you add one more micro service the number of Corrections that a database needs to handle the number of concurrent connection the database needs to handle is near doubled right now this is what is little interesting because to get a Max performance there is a persistent connection which is maintained between your client and your Juno proxy and your Juno proxy to your student server so as the number of micro Services increases the number of concurrent connections the number of concurrent persistent connections also increases so this means the number of inbound Corrections that a Juno DB cluster need to handle shoots up right so that's the first thing white needs to scale horizontally second the amount of data that because it is powering some of the key services around risk management Financial basic transaction processing and authentication you could see the amount of data that is stored on junodb key Value Store increases so you need to increase your storage adjustable third is increase the number of reads and write requests that are coming in which is your query load which is your compute load because as the number of collections increases the number of request increases the data increases the amount of queries that are fired will also shoot up when when this happens when this happens you need to increase your compute capabilities as well so let's talk about them one by one let's start with scaling connections so to scale the number of connections now this was the diagram that we saw this was the high level architecture diagram of junodb that we saw in the previous one right in case you haven't watched that video highly recommend you to watch that but we could see this being the high level architecture of junodb now what we want we want junodb to handle large number of connections persistent connection so let's go through that if it actually does that or not so your you use Juno SDK to talk to your Juno backend the Judo backend has load balancer now load balancer implicitly can horizontally scale and can handle persistent connections coming from your client right so this solves that problem now for you to handle large number of connections we know that each machine in the world has some limit to the number of concurrent the to the number of concurrent TCP connections that it can handle given that if you want to so because each machine has a limit for your system to handle large number of connections you need to make this horizontally scalable now for you to make this horizontal cable we need to see if the way we would want to make it means it needs to be stateless so is Juno proxy stateless that should be the key question because now because this is abstracted behind the load balancer you can add as many Juno proxy instances as you want but if this is stateful then you cannot go beyond a certain number so which is where your proxy has to be entirely stateless although still maintaining persistent connection but it will be easier for you to move persistent connection from one Juno proxy to another Juno proxy right so does our Juno proxies responsibility at this given stage allows us to do so yes it allows us because now you can add a more Juno proxy persistent connection is made to that and Juno proxy needs connection to all of those so anyway it is making connection to all of the storage servers right so all instances of Juno proxy are actually stateless they are actually equal because of which what happens you keep adding more Juno proxy as in when you need it your load balancer would create connections with all of them each instance of Juno proxy creates connection with all of them so there is no state that needs to be maintained making it truly horizontal scalable so for view to handle more inbound connections you just need to add more Juno proxy instances over here given that a storage layer would still be able to handle it and your load balancer is seamlessly scalable right that's the beauty of it right because it is stateless you can vary and stateless and equal you can add as many genotoxy instances so as to handle the large number of inbound connections that are happening the second one is about scaling data now this was easy to scale this because we designed it in a way that each instance creates connections with everyone and all it was relatively easier to do so it's horizontally scalable you add more but stateful things are hard to horizontally scale stateful implies databases your data layer your actual storage layer so as data grows you have to add more storage servers to it storage service is where your actual key value data is getting stored so as data grows it is essential for you to add more storage servers so now what Juno DB does is that we know that the data is split into partitions right called shards so what you know DP does is when you start a junodb cluster you pre-configure the number of partitions that you would have let's say you say I'll have 1024 partitions that's it it's a fixed number you will not have beyond that many number of shards no matter what right here in the diagram you could see six charts but in reality when your you start your junodb cluster you have to specify the number of part the number of shards that you would have let's say it's 1024 you cannot change it beyond that you cannot change it after that so the shards being fixed number all the data that you do all the data that you store will be distributed in this 1.1024 shots and these shards are then distributed among the storage servers whose ownership is determined by consistent hashing that feedback right so now let's say I have two storage server storage server and storage server two and now what has happened is the load has increased my storage servers are not able to handle the load the incoming load that exists I have taken example of six but in reality these are one zero two four shots right now what will I do I will want to add a new storage server to that now when I add a new storage server to that what I'll have to do is I'll have to move this Shard from one storage node to another storage how how you would know which Shard needs to be moved over here consistent dashing because they determine the data ownership that which chart goes on which storage server so when you add it the shards you can move the shards and update the mapping over there right so this is how simple it makes for you to add a new storage server and scale your storage layer seamlessly right and when this gets changed this storage server like this chart is owned by another storage server now what would happen this mapping will be updated in etcd that is running on Juno proxy as soon as etcd discovers it it would be strongly updating it uh in a strongly consistent way updating it across the Juno proxies so they would all know about it right and now this means that when your data comes when your next request comes in it will be automatically forwarded to the one that now owns that corresponding shot so we just made a storage layer with respect to the capacity that it holds seamlessly scalable truly scalable right now what does etcd host or what does etcd stores for us it stores the configuration what configuration that this Shard belongs to this storage server and how does it do that consistent dashing right so it's a very standard implementation of consistent dashing nothing fancy put them all in a ring who let your storage server ticks hold some spots on the store on the hash ring you take The Shard hash it with the same hash function figure out the storage itself at the right of it and that is the one that will own that corresponding shot right simple that's what that is exactly what your etcd or rather your proxy uses to identify where to forward a requested to and who owns Which shark which storage server own switch chart right a classic consistent dozing application want to read more about it hashing I have a very huge blog on it around implementation of it as well highly recommend you to look into that right so this is how your storage layer is defined your storage layer is made scalable but now and while ensuring minimal data moment when you add because now when you add over here you just need to move a few shards here and there few shots that were let's say if I add let me give a concrete example let's say I add a new storage server over here the shards that are the shards that are hashed in this location used to be owned by this storage server will now be owned by the storage server standard consistent matching implementation so but every other thing remains unchanged that's a beauty of consistent hashing right okay so now how do you move the data because now that charts the number of shards are fixed each shard within the Juno DB has a bunch of micro shards now micro shards is the building block that forms a shot it's just a logical entity think of it like when you are moving A Shard it's like moving a bunch of micro shots with it one at a time it's a building block of the data that you have the number of micro shots can vary with each chart right that's a internal detail of it so the flow Now understand the flow the way it happens had to go through the source code to do this okay so given a key that you would want to store or access what do you first do you take the key you pass it through the murmur hash and computer hash right so you pass it through the murmur hash you pass it the marble hashing algorithm you take the murder hash of it now if you mod it by the number of shards that you have so that you know this key belongs to which are this is not where consider hashing is coming in consideration comes for shard to storage server mapping here given a key you modded by the number of shots because the number of shards are fixed you can do more chart understand this that's why you are not using consistent hashing over here you are not really concentration because the number of shots are fixed you are doing percent you are doing mod Shard over here so hash mod shard you know which shot it belongs to now you from this you would know like these shards you know that it goes to this chart but this chart is present on which storage server you can get this using considered hashing this is what your proxy is doing your proxy is running concentration to get this mapping running as in it's not a process process it's just a mapping that is stored in etcd configuration so it uses etcd to know this chart is present on which storage server it talks to the storage server and forwards the request there right consideration is just occasionally run Whenever there is redistribution happening that's nothing fancy over there right but this is such a beautiful thing key highlight you do not have to use consideration everywhere here because your number of shards are fixed you do not need consistent hashing to figure out which chart that is belong to because it does not make sense if it is elastic then you need consistent dashing otherwise you don't number of shards are fixed you can just use a normal mod function like basically mod number of shards to know which chart would own this corresponding key so no consider hashing over here but because your storage servers can add and be removed from your infrastructure you need to know which chart is owned by which storage server this is where consistent hashing is used so the total number of shards key highlight on again remember this total number of shards is fixed let's say one zero two four and these are distributed across storage servers using consistent hashing right okay and PayPal uses like the current setup of PayPal as mentioned by them has 200 storage nodes which distributes one zero two four shots roughly student would have how many roughly five shards and they process hundred billion requests every day 100 billion requests every day so pretty huge scale so which is what I say you don't need over complex architecture to handle this right percent mod Shard would work you don't need consistent hashing everywhere so don't be under that misconception that consistent hashing is a magical solution it is not right okay and this is how your junodb seamlessly scales your storage layer today in this video we saw we talked about scaling of junodb just to give you a recap we said we understood why junior DB needs to be scaled how storage Network and compute needs to be scaled we saw how the network and computer seamlessly scaled horizontally because they are stateless and equal then we talk about how we scale the no sorry how we scale the storage layer where do we use consideration where do we not use consistent dashing how a key design decision is taken by Juno DB by fixing the number of shards that it has this way when we are doing it we just do a mod Shard over here right and then to figure out where which storage server the shot belongs to they use consistent hashing mapping is stored in ecdd distributed across Juno proxy instances each proxy is persistent connection to the packet to the storage layer and that's how the request goes through right and yeah this is how junodb seamlessly scales their storage their computer network so yeah this was the third video in the series if you find it interesting I hope you found it interesting hope you found it amazing that's it for this one I'll see in the next one where we talk about availability oh sorry uh yeah where we talk about availability I'll see in the next one thanks [Music] [Music] thank you [Music]

Here's the video ⤵

Courses I teach

Alongside my daily work, I also teach some highly practical courses, with a no-fluff no-nonsense approach, that are designed to spark engineering curiosity and help you ace your career.


System Design Masterclass

A no-fluff masterclass that helps SDE-2, SDE-3, and above form the right intuition to design and implement highly scalable, fault-tolerant, extensible, and available systems.


Details →

System Design for Beginners

An in-depth and self-paced course for absolute beginners to become great at designing and implementing scalable, available, and extensible systems.


Details →

Redis Internals

A self-paced and hands-on course covering Redis internals - data structures, algorithms, and some core features by re-implementing them in Go.


Details →


Writings and Learnings

Knowledge Base

Bookshelf

Papershelf


Arpit's Newsletter read by 100,000 engineers

Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.