1306 views • Outage Dissections
On the 18th of March, 2022, Google Maps faced a major outage affecting millions of people for a couple of hours. The outage happened due to a bad deployment.
Although a bad deployment does not sound too bad, the situation worsens when there are cascading failures if 3 services have a synchronous dependency, forming a chain-like A -> B -> C.
If A goes down, it will have some impact on B, and if the impact is big enough, we might see B going down as well; and extending it, we might see C going down.
This is exactly what happened in this Google Outage. Some services had a bad deployment, and they started crashing. Tile Rendering service depended on it, and the Tile rendering service went down because of retries.
The Direction SDK, Navigation SDK directly invoked the Tile rendering service for rendering the maps, which didn’t work, causing a big outage.
Rollback as soon as possible.
If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.
444 views • 31 likes • 2022-07-06
Outages are inevitable; but we should design our architecture such that if a component is down, it should not lead to a ...
1059 views • 58 likes • 2022-07-01
Distributed Systems are prone to problems that seem very obscure. GitHub had an outage because a set of nodes in the Zoo...
1165 views • 81 likes • 2022-06-29
So, how are databases managed in production? When the master goes down, how a replica is chosen and promoted to be the n...
A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.
Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.
Arpit's Newsletter read by 17000+ engineers
🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.
Powered by this tech stack.