An in-depth introduction to Canary Deployments

1982 views Backend System Design

Canary Deployments is a deployment pattern that rolls out the changes to a limited set of users before doing it for 100%.

We compare the vitals side-by-side from the old setup and the canary servers to ensure everything is as expected. If all is okay, then we incrementally roll out to a wider audience. If not, we immediately roll back our changes from the canaries.

Canary Deployment thus acts as an Early Warning Indicator to prevent a potential outage.

Why canary deployment is named canary deployment?

In 1920, coal miners used to carry caged canaries with them. If the gases in the mines were highly toxic the canaries would die and that alerted the miners to evacuate immediately, thus saving their lives.

In canary deployment, the canary servers are the caged canaries that alert us when anything goes wrong.

Implementing canary deployment

Canary deployments are implemented through a setup where a few servers serve the newer version while the reset serves the old version.

A router (load balancer / API gateway) is placed in front of the setup and it routes some traffic to the new fleet while the other requests continue to go to the old one.

Pros of Canary Deployment

  • we test our changes on real traffic
  • rollbacks are much faster
  • if something’s wrong only a fraction of users are affected
  • zero downtime deployments
  • we can gradually roll out the changes to users
  • we can power A/B Testing

Cons of Canary Deployment

  • engineers will get habituated to testing things in production
  • a little complex setup
  • a parallel monitoring setup is required to compare vitals side-by-side

Selecting users/servers for canary deployment?

The selection is use-case specific, but the common strategies are:

  • geographical selection to power regional roll-out
  • create specific user cohorts eg: beta users
  • random selection

When we absolutely need Canary Deployments

Say you own the Auth service that is written in Java and you chose to re-write it in - Golang. When taking it to production, you would NOT want to make a direct 100% roll-out given that the new codebase might have a lot of bugs.

This is where canary is super-helpful when we a fraction of servers serving requests from Golang server while others from the existing setup. We now forward 5% traffic to the new ones and observe how it reacts.

Once we have enough confidence in the newer setup, we increase the roll-out fraction to 15%, 50%, 75%, and eventually 100%. Canary setup thus gives us a seamless transition from our old server to a newer one.

Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 17000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.

Other essays that you might like

An in-depth introduction to Rolling Deployments

944 views 46 likes 2022-05-27

One of the simplest deployment strategies that make deployment a breeze is Rolling Deployment. It is the most widely ado...

Implementing Vertical Sharding

1124 views 75 likes 2022-05-25

Sharding is super-important when you want to handle the traffic that cannot be handled through one server. Sharding come...

An in-depth introduction to Blue Green Deployments

1309 views 60 likes 2022-05-18

Deployments are a pain if we are unsure about our release changes. But sometimes even if we know our changes well, somet...

An in-depth introduction to Canary Deployments

1982 views 117 likes 2022-05-16

Deployments are stressful; what if something goes wrong? What if you forgot to handle an edge case that was also missed ...

Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

Enrolled by 700+ learners

Details →

Designing Microservices

A free course to help you understand Microservices and their high-level patterns in depth.

Enrolled by 17+ learners

Details →

GitHub Outage Dissections

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 67+ learners

Details →

Hash Table Internals

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 25+ learners

Details →

BitTorrent Internals

A free course to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

Enrolled by 42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.

Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.

  • v12.4.4
  • © Arpit Bhayani, 2022

Powered by this tech stack.