How to handle database outages?

Watch the video explanation ➔

Why a database goes down?

An unexpected heavy load on your database can lead to a process crash or a massive slowdown.

Before jumping to the potential short-term and long-term solutions, ensure you monitor the database well. CPU, Memory, Disk, and Connections are being closely monitored.

Short term solutions

  • Kill the queries that have been running for a long time
  • Quickly scale up your database if you have been seeing a consistent heavy usage
  • Check if the recent deployment is the culprit; if so, revert asap
  • Reboot the database will calm the storm and buy you some time

Long term solutions

  • Ensure the right set of indexes is in place
  • Tune your database default parameters to gain optimal performance
  • Check for the notorious N+1 Queries
  • Upgrade the database version to get the best that DB can offer
  • Evaluate the need for Horizontal scaling using Replicas and Sharding

Here's the video ⤵

Courses

Super practical courses, with a no-nonsense approach, are designed to spark engineering curiosity and help you ace your career.


System Design for Beginners

An in-depth, self-paced, and on-demand course that for early engineers to become great at designing scalable, available, and extensible systems at scale.

Details →

System Design Masterclass

A masterclass that helps experienced engineers become great at designing scalable, fault-tolerant, and highly available systems.

Details →

Redis Internals

A course that helps covers Redis internals by reimplementing its core features like - event loop, serialization protocol, pipelining, eviction, and transactions.

Details →



Writings and Videos

Videos

Essays and Blogs


Arpit's Newsletter read by 70000+ engineers

Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.