Linear Probing for Conflict Resolution in Hash Tables



378 views Hash Table Internals



Linear probing is the simplest and one of the most efficient ways to handle conflicts in Hash Tables, let’s understand it in-depth.

Conflicts

Conflicts are inevitable, and Open Addressing is a technique to handle them in a space-efficient way. Instead of using an auxiliary data structure to hold the collided keys, open addressing leverages the free slots of the Hash Table to accommodate the collided keys.

To find the next available slot, open addressing defines a probing function that uses the key and the attempt to deterministically iterate through the slots.

Linear Probing

We first hash the key and find a primary slot. If that slot is free, we place the key there. If the slot is occupied, we check the slot to its right. We continue to process until we find an empty slot.

This way, we continue to hunt for the key linearly from the primary slot. Once we reach the end of the array, we circle back to index 0. Formally, the probing function for Linear Probing would be

p(k, i) = (h(k) + i) mod m

Hash Table Operations

Adding a key

We first find the primary slot of the key using the hash function. If the slot is empty, we place the key there. If not, we move to the right and find the first empty slot and place the key there.

Key lookup

Lookup is an iterative process where we first find the primary slot of the key, if the key is present at that slot then well and good. If not, we move to the right hunting for the key. We continue to linearly iterate through the array until

  • the key is found, or
  • we encounter an empty slot
  • we cover all m slots of the hash table

Deleting a key

Deleting a key in Linear Probing is always a soft delete. We first look up the key in the hash table and once we find it, we mark that slot as deleted but never physically empty it. This allows us to go beyond the deleted slot and hunt for any other collided keys.

Why is Linear Probing Fast?

Linear probing is fast because it beautifully exploits the locality of reference. To access a certain slot in the hash table, we fetch the page in the CPU cache. The page will not just contain the requested slot, but it also contains the neighboring slots as well.

Hence, upon collision when we iterate from that slot, we would not need to fetch the slots from RAM, instead, some slots would already be present in the cache making iterations superfast.

In an average case, Linear Probing gives constant time performance for adding, lookup, and deleting a key.

Challenges with Linear Probing

  1. Bad hash function would lead to many collisions
  2. It suffers from non-uniform clustered collisions

Hence, it is important to pick a good hash function, like Murmur Hash, to ensure a near-uniform distribution of keys and fewer collisions.


Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 17000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.




Other essays that you might like


Implementing Hash Maps with Hash Tables

425 views 15 likes 2022-08-03

Maps and Dictionaries are amazing, but how are they implemented? In this 11th video of this Hash Table Internals series...

Implementing Hash Sets with Hash Tables

259 views 6 likes 2022-08-01

Sets are amazing, but how are they implemented? In this 10th video of this Hash Table Internals series, we take an in-d...

Implementing Resize of a Hash Table

257 views 9 likes 2022-07-29

So, the Hash Table needs to be resized in order to maintain consistent performance, but how exactly? In this 9th Video ...

Why are Hash Tables always doubled?

770 views 19 likes 2022-07-27

Why are the underlying arrays of the hash tables always a power of 2? When we trigger a resize why are Hash Tables alway...


Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.


System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

Enrolled by 700+ learners

Details →

Designing Microservices

A free course to help you understand Microservices and their high-level patterns in depth.

Enrolled by 17+ learners

Details →

GitHub Outage Dissections

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 67+ learners

Details →

Hash Table Internals

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 25+ learners

Details →

BitTorrent Internals

A free course to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

Enrolled by 42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.




Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.



  • v12.4.4
  • © Arpit Bhayani, 2022

Powered by this tech stack.