Arpit's Newsletter read by 90000+ engineers
Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.
Linear probing is the simplest and one of the most efficient ways to handle conflicts in Hash Tables, let’s understand it in-depth.
Conflicts are inevitable, and Open Addressing is a technique to handle them in a space-efficient way. Instead of using an auxiliary data structure to hold the collided keys, open addressing leverages the free slots of the Hash Table to accommodate the collided keys.
To find the next available slot, open addressing defines a probing function that uses the key and the attempt to deterministically iterate through the slots.
We first hash the key and find a primary slot. If that slot is free, we place the key there. If the slot is occupied, we check the slot to its right. We continue to process until we find an empty slot.
This way, we continue to hunt for the key linearly from the primary slot. Once we reach the end of the array, we circle back to index 0. Formally, the probing function for Linear Probing would be
p(k, i) = (h(k) + i) mod m
We first find the primary slot of the key using the hash function. If the slot is empty, we place the key there. If not, we move to the right and find the first empty slot and place the key there.
Lookup is an iterative process where we first find the primary slot of the key, if the key is present at that slot then well and good. If not, we move to the right hunting for the key. We continue to linearly iterate through the array until
m
slots of the hash tableDeleting a key in Linear Probing is always a soft delete. We first look up the key in the hash table and once we find it, we mark that slot as deleted but never physically empty it. This allows us to go beyond the deleted slot and hunt for any other collided keys.
Linear probing is fast because it beautifully exploits the locality of reference. To access a certain slot in the hash table, we fetch the page in the CPU cache. The page will not just contain the requested slot, but it also contains the neighboring slots as well.
Hence, upon collision when we iterate from that slot, we would not need to fetch the slots from RAM, instead, some slots would already be present in the cache making iterations superfast.
In an average case, Linear Probing gives constant time performance for adding, lookup, and deleting a key.
Hence, it is important to pick a good hash function, like Murmur Hash, to ensure a near-uniform distribution of keys and fewer collisions.
Here's the video ⤵
Alongside my daily work, I also teach some highly practical courses, with a no-fluff no-nonsense approach, that are designed to spark engineering curiosity and help you ace your career.
A no-fluff masterclass that helps experienced engineers form the right intuition to design and implement highly scalable, fault-tolerant, extensible, and available systems.
An in-depth and self-paced course for absolute beginners to become great at designing and implementing scalable, available, and extensible systems.
A self-paced and hands-on course covering Redis internals - data structures, algorithms, and some core features by re-implementing them in Go.
Arpit's Newsletter read by 90000+ engineers
Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.