Arpit's Newsletter read by 90000+ engineers
Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.
A map, when implemented with Hash Tables, is called a Hash Map; and just like a regular map it supports operations like put, get, and iterate.
Hash Maps are required to store the application keys of any type, but the Hash Table understands only integers. We first pass the application keys through a hash function to map it to a 32-bit integer, and then the usual Hash Table operation takes over.
Given that the application key to hash key is a frequent operation, we would keep it handy by storing it alongside the application key. This helps us save repeated computations.
While looking up a key in the hash map, we first reach the slot, and then compare if the key present there is really the key we are looking for, as we cannot just rely on the equality of the hash keys. The Hash Map, hence, accepts a key comparator function.
Each node of the linked list in the chained hash table has the following structure
struct node {
int32 hash_key;
void *key;
void *value;
struct node *next;
}
void * key
and void * value
allows us to hold a key and a value of any type, while int32 hash_key
enables us to hold a pre-computed hash.
At the Hash Table level, we hold
Instead of having a contains
function, we have a lookup
function that returns the value for the matching key, and NULL
if the key does not exist. Thus, the lookup
function doubles as a contains
function.
When duplicate keys are inserted, we can do one of the following
Each slot of the hash table has the following structure
struct node {
int32 hash_key;
void *key;
void *value;
bool is_empty;
bool is_deleted;
}
void *key
and void *value
allows us to hold a key and a value of any type, while int32 hash_key
enables us to hold a pre-computed hash, saving a lot of runtime computations. is_empty
tells us if the slot is empty, while is_deleted
represents a soft deleted slot.
At the Hash Table level, we hold
Here the load factor will be computed as number of used slots/size of the array because the soft deleted keys also affect the performance of the hash table.
During insert, lookup, and delete when we find the matching hash, and explicitly compare the application keys, as multiple keys can hash to the same location, and we cannot rely on just hash key equivalence.
Here's the video ⤵
Alongside my daily work, I also teach some highly practical courses, with a no-fluff no-nonsense approach, that are designed to spark engineering curiosity and help you ace your career.
A no-fluff masterclass that helps experienced engineers form the right intuition to design and implement highly scalable, fault-tolerant, extensible, and available systems.
An in-depth and self-paced course for absolute beginners to become great at designing and implementing scalable, available, and extensible systems.
A self-paced and hands-on course covering Redis internals - data structures, algorithms, and some core features by re-implementing them in Go.
Arpit's Newsletter read by 90000+ engineers
Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.