"D" in ACID - Durability


After discussing the "A", the "C", and the "I", it is time to take a look at the "D" of ACID - Durability.

Durability seems to be a taken-for-granted requirement, but to be honest, it is the most important one. Let's deep dive and find why it is so important? How do databases achieve durability in the midst of thousands of concurrent transactions? And how to achieve durability in a distributed setting?

What is Durability?

In the context of Database, Durability ensures that once the transactions commit, the changes survive any outages, crashes, and failures, which means any writes that have gone through as part of the successful transaction should never abruptly vanish.

This is exactly why Durability is one of the essential qualities of any database, as it ensures zero data loss of any transactional data under any circumstance.

A typical example of this is your purchase order placed on Amazon, which should continue to exist and remain unaffected even after their database faced an outage. So, to ensure something outlives a crash, it has to be stored in non-volatile storage like a Disk; and this forms the core idea of durability.

How do databases achieve durability?

The most fundamental way to achieve durability is by using a fast transactional log. The changes to be made on the actual data are first flushed on a separate transactional log, and then the actual update is made.

This flushed transactional log enables us to reprocess and replay the transaction during database reboot and reconstruct the system's state to the one that it was in right before the failure occurred - typically the last consistent state of the database. The write to a transaction log is made fast by keeping the file append-only and thus minimizing the disk seeks.

Durability in ACID

Durability in a distributed setting

If the database is distributed, it supports Distributed Transactions, ensuring durability becomes even more important and trickier to handle. In such a setting, the participating database servers coordinate before the commit using a Two-Phase Commit Protocol.

The distributed computation is converged into a step-by-step process where the coordinator communicates the commit to all the participants, waits for all acknowledgments, and then further communicates the commit or rollback. This entire process is split into two phases - Prepare and Commit.

References


Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 21000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.




Other essays that you might like


ACID - Isolation

301 reads 2021-07-05

Isolation is the ability of the database to concurrently process multiple transactions in a way that changes made in one...

ACID - Consistency

387 reads 2021-07-02

In the context of databases, Consistency is Correctness, which means that under no circumstance will the data lose its c...

ACID - Atomicity

748 reads 2021-06-28

A single database transaction often contains multiple statements to be executed on the database. In Relational Databases...

Bitcask - A Log-Structured Fast KV Store

574 reads 2020-07-19

Bitcask is a Key-Value store that persists its data in append-only log files and still reaps super-performant read-write...


Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.


Paid Courses

System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

1000+ learners

Details →

Redis Internals

Learn internals of Redis by re-implementing some of the core features in Golang.

28+ learners

Details →

Free Courses

Designing Microservices

A free playlist to help you understand Microservices and their high-level patterns in depth.

17+ learners

Details →

GitHub Outage Dissections

A free playlist to help you learn core engineering from outages that happened at GitHub.

67+ learners

Details →

Hash Table Internals

A free playlist to help you understand the internal workings and construction of Hash Tables.

25+ learners

Details →

BitTorrent Internals

A free playlist to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.





  • v13.8.5
  • © Arpit Bhayani, 2022

Powered by this tech stack.