Outage Dissections

15 videos


So, the outage is mitigated, now what?

500 views 24 likes 2022-07-08

Outages happen and in such a tense situation, the main priority is to get the system back up, but is that it? Is everyth...

Control an outage by localizing the failures

444 views 31 likes 2022-07-06

Outages are inevitable; but we should design our architecture such that if a component is down, it should not lead to a ...

Dissecting GitHub Outage - Multiple Leaders in Zookeeper Cluster

1059 views 58 likes 2022-07-01

Distributed Systems are prone to problems that seem very obscure. GitHub had an outage because a set of nodes in the Zoo...

GitHub Outage - How databases are managed in production

1165 views 81 likes 2022-06-29

So, how are databases managed in production? When the master goes down, how a replica is chosen and promoted to be the n...

Dissecting GitHub Outage - Downtime due to Rate Limiter

991 views 46 likes 2022-06-24

Rate limiters are supposed to avoid downtimes, but have you ever heard that a rate limiter caused a downtime? This happe...

Dissecting GitHub Outage - Master failover failed

829 views 36 likes 2022-06-22

Companies announce their planned maintenance, what happens during that? Could something go wrong while running maintenan...

Dissecting GitHub Outage - Downtime they thought was avoided

458 views 26 likes 2022-06-10

Has it ever happened to you that you anticipated that something would go wrong, you pro-actively fixed it, but it still ...

Dissecting GitHub Outage Downtime due to creating an Index

894 views 58 likes 2022-06-06

GitHub wanted to optimize their SQL query performance, and they had to reverse a database index. Instead of getting a pe...

Dissecting GitHub Outage - Repository Creation Failed

522 views 25 likes 2022-06-03

Imagine you trying to create a new GitHub repository and it call is failing, failing for 53 minutes. This happened with ...

Dissecting GitHub Outage: Downtime due to an Edge Case

1173 views 57 likes 2022-05-23

In August 2021, GitHub experienced an outage where their MySQL Master database went into a degraded state. Upon investig...

Dissecting GitHub Outage - Downtime due to ALTER TABLE

1963 views 102 likes 2022-05-09

Can an ALTER TABLE command take down your production? 🤯 GitHub had a major outage and it all started with a schema migr...

An engineering deep-dive into Atlassian's Mega Outage of April 2022

4782 views 247 likes 2022-04-15

In April 2022, Atlassian suffered a major outage where they "permanently" deleted the data for 400 of their paying cloud...

Dissecting Google Maps Outage: Bad Rollout and Cascading Failures

1306 views 82 likes 2022-04-01

Google Maps had a global outage on 18th March 2022, during which the end-users were not able to use Directions, Navigati...

Dissecting GitHub Outage: ID column reaching the max value 2147483647

1986 views 161 likes 2022-03-23

GitHub experience an outage on 5th May 2020 on a few of their internal services and it happened because a table had an a...

Dissecting Spotify's Global Outage - March 8, 2022

3314 views 195 likes 2022-03-12

Incident Report: Spotify Outage on March 8: https://engineering.atspotify.com/2022/03/incident-report-spotify-outage-on-...

Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

Enrolled by 700+ learners

Details →

Designing Microservices

A free course to help you understand Microservices and their high-level patterns in depth.

Enrolled by 17+ learners

Details →

GitHub Outage Dissections

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 67+ learners

Details →

Hash Table Internals

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 25+ learners

Details →

BitTorrent Internals

A free course to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

Enrolled by 42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.

Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.

  • v12.4.4
  • © Arpit Bhayani, 2022

Powered by this tech stack.