How @twitter keeps its Search systems up and stable at scale



3819 views Backend System Design



Managing massive, talking hundreds of terabytes here, Search clusters is no joke, especially at @Twitter’s scale.

To manage them efficiently, Twitter built a bunch of toolings, here’s a quick gist about it 🧵👇

Twitter uses ES to power the search of tweets, users, and DMs. ES gives them the necessary speed, performance, and horizontal scalability.

Given massive adoption, they needed to ensure the efficiency, and stability of these clusters and provide some standardized way of access.

Elasticsearch Proxy

The Twitter team built a simple proxy for Elasticsearch that transparently sits in front of the Elasticsearch cluster.

The proxy is an extremely simple and lightweight TCP and HTTP-based relay that…

in a standard way, captures all critical metrics like - cluster health, latency, success, and failure rates here; along with this we can also

  • throttle when some client abuses
  • apply security practices
  • route to a specific node
  • authenticate

Ingestion Service

ES performance degrades when there is a massive surge in traffic. We typically see an

  • increased indexing latencies
  • increased query latencies

But it is a common usecase for Twitter to ingest massive data (tweets) every now and then, hence they tweaked the ingestion…

The write requests that come to the ES proxy are sent to Kafka. Consumers read from Kafka and relay them to the ES cluster.

Doing it asynchronously allows us to

  • do batch writes
  • and retry if the ES down
  • consume at a comfortable pace
  • slow down if ES is overwhelmed

Backfill Service

Twitter has a constant need of ingesting 100s of TBs of data in the Elasticsearch clusters.

Doing massive ingestion through Map Reduce jobs directly on ES will take down the entire cluster and doing it through Kafka makes it unnecessarily granular;

hence a backfill service …

The backfill indexing requests are dumped on an HDFS.

The requests are partitioned and read using distributed jobs and indexed in Elasticsearch.

A separate orchestrator computes the number of workers required to consume the indexing requests.


Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 30000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.




Other essays that you might like


Thundering Herd Problem and How not to do API retries

709 views 68 likes 2023-01-31

When the network is unreliable the clients retry the APIs to ensure completion. This approach works when there are fewer...

Designing Idempotent API Endpoints for Payments at Stripe

3401 views 231 likes 2023-01-29

https://www.youtube.com/channel/UC_b1GUJv_2QiMP4BxC9-Dxg/join Learn System Design: https://arpitbhayani.me/masterclass ...

How Slack efficiently classifies emails at scale with an eventually consistent system

2224 views 83 likes 2023-01-13

https://www.youtube.com/channel/UC_b1GUJv_2QiMP4BxC9-Dxg/join Learn System Design: https://arpitbhayani.me/masterclass ...

How @ShopifyEngineering avoids hot shards by moving data across databases without any downtime

3427 views 130 likes 2023-01-08

https://www.youtube.com/channel/UC_b1GUJv_2QiMP4BxC9-Dxg/join Learn System Design: https://arpitbhayani.me/masterclass ...


Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.


Paid Courses

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

1000+ learners

Details →

Redis Internals

Learn internals of Redis by re-implementing some of the core features in Golang.

46+ learners

Details →

Free Courses

Designing Microservices

A free playlist to help you understand Microservices and their high-level patterns in depth.

106+ learners

Details →

GitHub Outage Dissections

A free playlist to help you learn core engineering from outages that happened at GitHub.

251+ learners

Details →

Hash Table Internals

A free playlist to help you understand the internal workings and construction of Hash Tables.

427+ learners

Details →

BitTorrent Internals

A free playlist to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

192+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.