How do indexes make databases read faster?

4312 views Database Engineering

The database is just a collection of records. These records are serialized and stored on the disk. The way the records are serialized and stored on the disk depends on the database engine.

How does the database read from the disk?

A disk is always read in Blocks, and a block is typically 8kb big. When any disk IO happens, even requesting one byte, the entire block is read in memory, and the byte is returned from it.

Say we have a “users” table with 100 rows, with each row being 200B long. If the block size is 600B, we can read 600/200 = 3 rows in one disk read.

To find all users with age = 23, we will have to read the entire table row by row and filter out the relevant documents; we will have to read the entire table. In one shot, we read 3 rows so that it would take 100/3 = 34 disk/block reads to answer the query.

Let’s see how indexes make this faster.

An index is a small referential table with two columns: the indexed value and the row ID. So an index on the age column will have two columns age (4 bytes) and row ID (4 bytes).

Each entry in the index is 8 bytes long, and the total size of the index will be 100 * 8 = 800 bytes. When stored on the disk, the index will require 2 disk blocks.

When we want to get users with age == 23, we will first read the entire index, taking 2 disk IOs and filtering out relevant row IDs. Then make disk reads for the relevant rows from the main table. Thus we read 2 blocks to load the index and only relevant blocks have the actual data.

In our example, it comes out to be 2 + 2 = 4 block reads. So, we get an 8x boost in performance using indexes.

Note: This is a very crude example of how fetch happens with indexes; there are a lot of optimizations that I have not talked about.

Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 17000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.

Other essays that you might like

What are Embedded Databases?

2349 views 101 likes 2022-03-25

Embedded databases are coupled with the application they are part of and operate in a confined space. They are designed ...

How does the database guarantee reliability using write-ahead logging?

2461 views 135 likes 2022-03-21

Any persistent database needs to guarantee reliability. No matter how big or small the changes are, they should survive ...

How do indexes make databases read faster?

4312 views 303 likes 2022-03-16

In this video, we discuss how indexes make a database operate faster. While discussing that, we dive deep into how the d...

How to handle database outages?

2427 views 160 likes 2022-03-14

In this video, we talk about why a database goes down, what happens when the database is down, a few short-term solution...

Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

Enrolled by 700+ learners

Details →

Designing Microservices

A free course to help you understand Microservices and their high-level patterns in depth.

Enrolled by 17+ learners

Details →

GitHub Outage Dissections

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 67+ learners

Details →

Hash Table Internals

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 25+ learners

Details →

BitTorrent Internals

A free course to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

Enrolled by 42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.

Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.

  • v12.4.4
  • © Arpit Bhayani, 2022

Powered by this tech stack.