Implementing Hash Maps with Hash Tables



425 views Hash Table Internals



A map, when implemented with Hash Tables, is called a Hash Map; and just like a regular map it supports operations like put, get, and iterate.

Key Implementation Details

Hash Maps are required to store the application keys of any type, but the Hash Table understands only integers. We first pass the application keys through a hash function to map it to a 32-bit integer, and then the usual Hash Table operation takes over.

Given that the application key to hash key is a frequent operation, we would keep it handy by storing it alongside the application key. This helps us save repeated computations.

While looking up a key in the hash map, we first reach the slot, and then compare if the key present there is really the key we are looking for, as we cannot just rely on the equality of the hash keys. The Hash Map, hence, accepts a key comparator function.

With Chained Hash Tables

Each node of the linked list in the chained hash table has the following structure

struct node {
    int32 hash_key;
    void *key;
    void *value;
    struct node *next;
}

void * key and void * value allows us to hold a key and a value of any type, while int32 hash_key enables us to hold a pre-computed hash.

At the Hash Table level, we hold

  • the array of linked list
  • size of the array
  • number of keys
  • key comparator function

Instead of having a contains function, we have a lookup function that returns the value for the matching key, and NULL if the key does not exist. Thus, the lookup function doubles as a contains function.

When duplicate keys are inserted, we can do one of the following

  • do not insert at all
  • delete the old key and re-insert it again, bringing it to the head of the list, making it quicker to reach

With Hash Tables having open addressing

Each slot of the hash table has the following structure

struct node {
    int32 hash_key;
    void *key;
    void *value;
    bool is_empty;
    bool is_deleted;
}

void *key and void *value allows us to hold a key and a value of any type, while int32 hash_key enables us to hold a pre-computed hash, saving a lot of runtime computations. is_empty tells us if the slot is empty, while is_deleted represents a soft deleted slot.

At the Hash Table level, we hold

  • the array of linked list
  • size of the array
  • number of active keys
  • number of used slots
  • key comparator function

Here the load factor will be computed as number of used slots/size of the array because the soft deleted keys also affect the performance of the hash table.

During insert, lookup, and delete when we find the matching hash, and explicitly compare the application keys, as multiple keys can hash to the same location, and we cannot rely on just hash key equivalence.


Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 17000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.




Other essays that you might like


Implementing Hash Maps with Hash Tables

425 views 15 likes 2022-08-03

Maps and Dictionaries are amazing, but how are they implemented? In this 11th video of this Hash Table Internals series...

Implementing Hash Sets with Hash Tables

259 views 6 likes 2022-08-01

Sets are amazing, but how are they implemented? In this 10th video of this Hash Table Internals series, we take an in-d...

Implementing Resize of a Hash Table

257 views 9 likes 2022-07-29

So, the Hash Table needs to be resized in order to maintain consistent performance, but how exactly? In this 9th Video ...

Why are Hash Tables always doubled?

770 views 19 likes 2022-07-27

Why are the underlying arrays of the hash tables always a power of 2? When we trigger a resize why are Hash Tables alway...


Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.


System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

Enrolled by 700+ learners

Details →

Designing Microservices

A free course to help you understand Microservices and their high-level patterns in depth.

Enrolled by 17+ learners

Details →

GitHub Outage Dissections

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 67+ learners

Details →

Hash Table Internals

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 25+ learners

Details →

BitTorrent Internals

A free course to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

Enrolled by 42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.




Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.



  • v12.4.4
  • © Arpit Bhayani, 2022

Powered by this tech stack.