High level architecture and System Design of JunoDB

Watch the video explanation ➔

In this article, we will explore the architecture of JunoDB, a key-value database, and discuss its key components, design decisions, and scalability features. JunoDB is a production-grade key-value store that provides horizontal scalability and persistence capabilities.

Storage Server

The primary component in JunoDB’s architecture is the storage server. Storage servers are instances where the actual data is stored. They accept operations such as key retrieval, key insertion, and key deletion from other components.

The storage servers can store data in memory or on disk, depending on the configuration. JunoDB uses the popular embedded database called RocksDB to store the raw key-value pairs. Each storage server is responsible for a set of data partitions or shards.

Sharding and Data Ownership

To handle large volumes of data, JunoDB splits the data into multiple partitions or shards. Each shard is an instance of RocksDB running on a storage server. To determine data ownership and which storage server owns a particular shard, JunoDB utilizes consistent hashing.

The consistent hashing algorithm maps the shards and storage servers on a ring. By applying a hash function, JunoDB can easily identify the storage server responsible for a specific shard. This approach ensures minimal data movement and enables easy reassignment of shards in case of a storage server failure or addition of a new node.

Juno Proxy

To abstract the complexity of the storage server topology from the clients, JunoDB introduces a proxy layer called Juno Proxy. Instead of clients directly connecting to storage servers, they communicate with the Juno Proxy, which handles the routing of requests.

The Juno Proxy establishes persistent connections with all the storage servers in the system. When a request arrives, the Juno Proxy determines the appropriate storage server based on the consistent hashing mechanism and forwards the request to the corresponding server. This abstraction simplifies client configurations, reduces the need for direct connections to storage servers, and provides a centralized entry point for client requests.

Horizontal Scalability

JunoDB’s architecture is designed to be horizontally scalable. The scalability is achieved through multiple instances of Juno Proxy and storage servers. The load balancer acts as a gateway for client requests, distributing the traffic among the available Juno Proxy instances.

Each Juno Proxy instance uses consistent hashing to determine the target storage server for each request. This allows the system to seamlessly handle increased incoming requests by adding more Juno Proxy instances. The Juno Proxy instances maintain their own copies of the consistent hashing map, and any changes in the storage server topology are propagated through a strongly consistent distributed key-value configuration store called etcd.

Conclusion

JunoDB’s architecture provides a robust and horizontally scalable solution for key-value storage. By leveraging consistent hashing and distributed components like Juno Proxy and etcd, JunoDB ensures efficient data distribution, fault tolerance, and easy scalability. The combination of storage servers, Juno Proxy, and consistent hashing enables JunoDB to handle large volumes of data while maintaining performance and reliability.

Here's the video ⤵

Courses I teach

Alongside my daily work, I also teach some highly practical courses, with a no-fluff no-nonsense approach, that are designed to spark engineering curiosity and help you ace your career.


System Design Masterclass

A no-fluff masterclass that helps experienced engineers form the right intuition to design and implement highly scalable, fault-tolerant, extensible, and available systems.


Details →

System Design for Beginners

An in-depth and self-paced course for absolute beginners to become great at designing and implementing scalable, available, and extensible systems.


Details →

Redis Internals

A self-paced and hands-on course covering Redis internals - data structures, algorithms, and some core features by re-implementing them in Go.


Details →


Writings and Learnings

Knowledge Base

Bookshelf

Papershelf


Arpit's Newsletter read by 90000+ engineers

Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.