Database Sharding and Partitioning

3133 views Backend System Design

Sharding and partitioning come in very handy when we want to scale our systems. Let’s talk about these concepts in detail.

How is the database scaled?

A database server is just a database process (like MySQL, MongoDB) running on a virtual server like EC2. Now when we put our database in production it starts getting from real good traction, say 100 writes per second (WPS).

Steady user growth

Say, your product started getting some traction, and we find that the database is not able to handle the load, we scale it up by adding more CPU, RAM, and Disk to the server. This way we are now handling 200 WPS.

More read traffic

If we see nor reads then can also choose to add a Read Replica and divert some of the read traffic to this node, while the master node can take in 200 WPS.

Viral Growth

Say, your product went viral and you now got 5x more load which means now you have to handle 1000 WPS. To achieve this you again scale it up vertically and handle the desired load.

Insane growth

Say, you now cracked the PMF and are getting some really solid traction and need to handle 1500 WPS, and when you visit the database console you found out that it is not possible to vertically scale your database any further, so how do you handle 1500 WPS?

This is where the horizontal scaling comes into the picture.

Scaling the database horizontally

We know one database server can handle 1000 WPS, but we need to handle 1500 WPS, so we split the data into half and split it across two databases such that each database owns half of the data and all the writes for that data goes to that particular instance.

This way each server will get 750 WPS, which it can very easily handle, and owns 50% of the data. Thus by adding more database servers we handled 1500 WPS (more than what a single machine could handle)

Sharding and Partitioning

Each database server in the above architecture is called a Shard while the data is said to be partitioned. Overall, a database is sharded and the data is partitioned.

Partitioned data on shards

It is possible to have more partitions and fewer shards and in that case, each shard will own multiple partitions. Say, we have 100GB of data and it is split into 5 partitions and we have 2 shards. One shard will be responsible for 3 partitions while the other for 2.

Advantages and Disadvantages

Advantages of Sharding

  • handle more reads and writes
  • increases overall storage capacity
  • overall high availability

Disadvantages of Sharding

  • sharding is operationally complex
  • cross-shard queries are super-expensive

Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 17000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.

Other essays that you might like

An in-depth introduction to Rolling Deployments

944 views 46 likes 2022-05-27

One of the simplest deployment strategies that make deployment a breeze is Rolling Deployment. It is the most widely ado...

Implementing Vertical Sharding

1124 views 75 likes 2022-05-25

Sharding is super-important when you want to handle the traffic that cannot be handled through one server. Sharding come...

An in-depth introduction to Blue Green Deployments

1309 views 60 likes 2022-05-18

Deployments are a pain if we are unsure about our release changes. But sometimes even if we know our changes well, somet...

An in-depth introduction to Canary Deployments

1982 views 117 likes 2022-05-16

Deployments are stressful; what if something goes wrong? What if you forgot to handle an edge case that was also missed ...

Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

Enrolled by 700+ learners

Details →

Designing Microservices

A free course to help you understand Microservices and their high-level patterns in depth.

Enrolled by 17+ learners

Details →

GitHub Outage Dissections

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 67+ learners

Details →

Hash Table Internals

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 25+ learners

Details →

BitTorrent Internals

A free course to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

Enrolled by 42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.

Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.

  • v12.4.4
  • © Arpit Bhayani, 2022

Powered by this tech stack.