How Flipkart made their type ahead search hyper personalized



2031 views Backend System Design



To help us search quicker and better, Flipkart suggests queries as we type. These suggestions are not only popular suggestions, instead, but they are also hyper-personalized. Let’s take a look at how they designed this system.

For a given prefix, say “sh”, we should rank the query suggestions - “shoes”, “shirts”, and “shorts” - in the context of the user.

Parameters of ranking

  1. Quality of the suggestion

  2. popularity: how popular the search term is?

  3. performance: does this term has enough results?
  4. grammar quality: is the term grammatically correct?

  5. User-actions

  6. past few search terms of the user

  7. past purchase history of the user

Personalizing the suggestions

A naive way of doing this would be to create cohorts of the users and show all of them the same suggestions for the given prefix. But we wanted to show suggestions that are relevant as per the recently fired queries.

For example: if a user searched - shoes, red shoes, Nike shoes and then typed “a” - we should be showing “Adidas shoes” and not “apple iPhone”.

Understanding the intent

Flipkart has a taxonomy of the product categories they sell and lists on the platform. The taxonomy holds categories like Fashion and Electronics, and within Fashion, we have Clothing and Footwear, etc.

We first associate the given search query with this taxonomy. If two terms are mapped to close nodes in the taxonomy, they would be contextually relevant and similar.

Evaluate Category Similarity

We need to determine the probability that the current search term/prefix would belong to the same category as the past search terms.

For example: “computer monitor”, “mou” -> “computer mouse”

Evaluate Reformulation

We need to determine the probability that the current search term/prefix is being written to reformulate the existing context.

For example: “shoes”, “red shoes”, “n…” -> “Nike Shoes”

Training the model

A machine learning model needs to be trained on all viewed items on suggestions, all clicked suggestions, and all unclicked suggestions.

Our model should try to maximize the likelihood of the person clicking the suggestion.

The feature relationships were modeled and ingested in Xgboost ( decision trees ) and the importance of each feature needs to be quantified and evaluated.

Architecture

There will be an “autosuggestion” service whose job is to serve the suggestions to the user, given the search term.

This service will have a small cache that would hold the data for the most popular search term prefix to serve non-personalized suggestions.

The autosuggestion service will talk to Solr to serve the ML-ranked query suggestions for the given term. The relevance model to be configured in Solr will be “Learning to Rank”

A huge amount of data, through the data pipelines, will be ingested into Solr and the ML model will be explicitly trained on xgboost and ingested through a different component.


Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 21000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.




Other essays that you might like


Overview of Discord's data platform that daily processes petabytes of data and trillion points

924 views 54 likes 2022-11-14

When a company scales, they adopt microservices and each service typically gets its own independent database. With data ...

How Airbnb designed and scaled its central authorization system - Himeji

2206 views 98 likes 2022-11-07

Authorization plays a critical role in ensuring that the platform is not abused. For example, Instagram ensures that if ...

How Gojek masks and keeps users' phone numbers secure at scale?

2572 views 152 likes 2022-10-31

Do hyperlocal companies like Uber, Ola, Swiggy, Gojek, Zomato, etc share our phone numbers with the delivery people or t...

The architecture of Yelp's in-house Search Engine - nrtSearch

2193 views 81 likes 2022-10-24

Elasticsearch is a great search engine, but Yelp was not happy with its performance, so they built their own HTTP layer ...


Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.


Paid Courses

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

1000+ learners

Details →

Redis Internals

Learn internals of Redis by re-implementing some of the core features in Golang.

28+ learners

Details →

Free Courses

Designing Microservices

A free playlist to help you understand Microservices and their high-level patterns in depth.

17+ learners

Details →

GitHub Outage Dissections

A free playlist to help you learn core engineering from outages that happened at GitHub.

67+ learners

Details →

Hash Table Internals

A free playlist to help you understand the internal workings and construction of Hash Tables.

25+ learners

Details →

BitTorrent Internals

A free playlist to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.





  • v13.7.5
  • © Arpit Bhayani, 2022

Powered by this tech stack.