Why Do Databases Deadlock and How Do They Resolve It

Arpit Bhayani

curious, tinkerer, and explorer


Database deadlocks are one of the most challenging concurrency issues encountered in production systems. Understanding why they occur and how databases handle them is important in building robust applications.

Let’s dig deeper and explore the mechanics of deadlocks, their underlying causes, detection mechanisms, and potential resolution strategies.

Database Deadlocks

A deadlock occurs when two or more transactions are waiting indefinitely for each other to release locks, creating a circular dependency that prevents any of the transactions from proceeding.

In database terms, a deadlock happens when:

  • Txn A holds a lock on resource X and waits for a lock on resource Y
  • Txn B holds a lock on resource Y and waits for a lock on resource X

Neither transaction can proceed, creating an infinite wait condition, making them wait forever.

The Classic Example

Consider this scenario with two transactions executing simultaneously

-- Transaction 1
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

-- Transaction 2 (executing concurrently)
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;
UPDATE accounts SET balance = balance + 50 WHERE id = 1;
COMMIT;

Txn 1 locks account 1 and waits for account 2, while Txn 2 locks account 2 and waits for account 1. This creates a circular wait condition - a classic example of a deadlock.

Why Deadlocks Occur

Lock Ordering Issues

The most common cause of deadlocks is inconsistent lock ordering across transactions. When different transactions acquire locks on the same resources in different orders, circular dependencies become inevitable with sufficient concurrency.

Real-world example: In an e-commerce system, one transaction might update inventory first, then the user account, while another updates the user account first, then inventory. Under high load, these transactions will eventually deadlock.

Transaction Design and Scope

Long-running transactions increase deadlock probability by holding locks for extended periods. The longer a transaction runs, the more likely it is to conflict with other transactions.

Problematic pattern

BEGIN;
-- Heavy computation or external API call

SELECT * FROM products WHERE category = 'electronics';

-- ... application logic taking 30 seconds ...

UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 123;
COMMIT;

Index and Query Plan Variations

Database optimizers can choose different execution plans for similar queries, leading to different lock acquisition orders. This is particularly problematic with range queries and complex joins.

-- These queries might acquire locks in different orders
-- depending on available indexes and statistics

--- This being written demonstrates the order of lock acq
--- the query would be exactly the same in both cases.

SELECT * FROM orders o
	JOIN customers c ON o.customer_id = c.id 
WHERE c.region = 'US' AND o.status = 'pending';

SELECT * FROM customers c
	JOIN orders o ON c.id = o.customer_id 
WHERE o.status = 'pending' AND c.region = 'US';

Lock Escalation

Lock escalation—where databases convert many fine-grained locks into fewer coarse-grained locks for memory efficiency—can create unexpected deadlock scenarios. A row-level deadlock might escalate to a table-level deadlock, affecting entirely different transactions.

Foreign Key Constraints

Foreign key relationships can create hidden lock dependencies. When you update a parent table, the database may need to check or lock related child table records, creating unexpected lock chains.

Deadlock Detection

Databases employ sophisticated algorithms to detect deadlocks automatically. The most common approach is the wait-for-graph algorithm.

Wait-for Graph Algorithm

The database maintains a directed graph where

  • Nodes represent active transactions
  • Edges represent wait relationships (Transaction A waits for Transaction B)
  • A cycle in the graph indicates a deadlock

Detection process

  1. Periodically (typically every few seconds), the database examines all waiting transactions
  2. Constructs a wait-for graph based on current lock requests and holdings
  3. Searches for cycles using graph traversal algorithms
  4. When a cycle is found, identify it as a deadlock

Some databases also do deadlock prevention, which means all the above steps, but right before granting a lock to a transaction. The lock manager typically kills the transaction that asked for the lock that could have led to the deadlock.

Detection Frequency and Performance

Deadlock detection isn’t free; it requires CPU cycles and can impact performance. Databases balance detection frequency with system overhead

  • PostgreSQL: Checks for deadlocks every 1 second by default (deadlock_timeout)
  • MySQL: Uses immediate deadlock detection for some scenarios, periodic checking for others
  • SQL Server: Runs deadlock detection every 5 seconds, with more frequent checks under high contention

Deadlock Resolution Strategies

When a deadlock is detected via a periodic check, databases must break the cycle by terminating one or more transactions. This process involves victim selection and recovery mechanisms.

Victim Selection Algorithms

Databases use various criteria to choose which transaction to abort

Transaction That Started Detection

  • The transaction that started deadlock detection (usually newer) is the one that gets killed.

Work-based Selection

  • Abort the transaction that has done the least work
  • Minimizes the amount of rollback required
  • Measured by number of log records, CPU time, or rows modified

Time-based Selection

  • Abort the newest transaction (assuming older transactions are more important)
  • Or abort the oldest transaction (to prevent starvation)

Priority-based Selection

  • Some databases allow setting transaction priorities
  • Lower-priority transactions are chosen as victims
  • Useful for distinguishing between critical and background operations

Rollback Cost Estimation

  • Calculate the cost of rolling back each transaction
  • Consider factors like transaction size, complexity, and resource usage
  • Choose the transaction with the lowest rollback cost

Database-Specific Deadlock Handling

PostgreSQL

PostgreSQL prefers lazy detection i.e., it checks for deadlocks after a timeout period deadlock_timeout (default 1s) and at that time builds a full wait-for graph.

-- PostgreSQL deadlock detection configuration
SET deadlock_timeout = '1s';  -- Wait before starting detection
SET log_lock_waits = on;      -- Log long lock waits

Thus, we see that PostgreSQL runs with an assumption that most lock waits resolve quickly, so it doesn’t waste CPU on immediate checks.

Advantages:

  • Lower CPU overhead during normal operation (lazy detection)
  • Better performance when lock waits are typically short
  • More predictable behavior under high concurrency

Disadvantages:

  • Minimum 1-second delay before deadlock detection
  • Can lead to longer overall wait times in deadlock scenarios

Example

-- Session 1
SET deadlock_timeout = '5s';

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- Holds exclusive lock on account 1, waits here:
-- Now run the next command but don't hit enter until Session 2 is waiting
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

-- Session 2 (runs concurrently)
SET deadlock_timeout = '5s';

BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;
-- Holds exclusive lock on account 2, waits here:
-- This will block, then Session 1 will also block, then PostgreSQL will detect deadlock
UPDATE accounts SET balance = balance + 50 WHERE id = 1;

-- PostgreSQL behavior:
-- 1. Session 2 waits for 1 second (deadlock_timeout)
-- 2. Builds wait-for graph: Session1 → Session2 → Session1 (cycle!)
-- 3. Kills Session 2 with error: "deadlock detected"

PostgreSQL Output

ERROR:  deadlock detected
DETAIL:  Process 93 waits for ShareLock on transaction 745; blocked by process 86.
         Process 86 waits for ShareLock on transaction 746; blocked by process 93.

MySQL (InnoDB) Approach

MySQL uses a more aggressive approach where, for a simple two-transaction deadlock, detection is instantaneous, but for complex scenarios, it runs detection every few seconds (~5s) i.e., it falls back to timeout-based resolution (innodb_lock_wait_timeout)

SET innodb_lock_wait_timeout = 50;        -- Max wait time (seconds)
SET innodb_deadlock_detect = ON;          -- Enable deadlock detection
SET innodb_print_all_deadlocks = ON;      -- Log all deadlocks

Thus, MySQL runs with the assumption that deadlocks should be resolved as quickly as possible

Advantages:

  • Immediate detection for simple deadlocks
  • Faster resolution when deadlocks occur
  • Better for applications that can’t tolerate long waits

Disadvantages:

  • Higher CPU overhead from constant monitoring
  • A more complex detection algorithm can have edge cases

Example

-- Same scenario in MySQL
-- Session 1
START TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

-- Session 2
START TRANSACTION;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;
UPDATE accounts SET balance = balance + 50 WHERE id = 1;

-- MySQL behavior:
-- 1. Immediately detects a simple deadlock pattern
-- 2. Chooses the victim based on row modification count
-- 3. Rolls back the transaction with fewer changes

MySQL Output

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Prevention Strategies

Consistent Lock Ordering

Establish and enforce consistent ordering of resource access across all transactions:

-- Bad: Inconsistent ordering
UPDATE table_a WHERE id = 1;
UPDATE table_b WHERE id = 2;

-- Good: Consistent ordering (always access table_a before table_b)
UPDATE table_a WHERE id = 1;
UPDATE table_b WHERE id = 2;

Minimize Transaction Scope

Keep transactions as short as possible:

-- Bad: Long-running transaction
BEGIN;
SELECT * FROM large_table WHERE complex_condition;
-- ... extensive application processing ...
UPDATE summary_table SET total = new_value;
COMMIT;

-- Good: Separate read and write phases
-- 1. Read data outside the transaction
SELECT * FROM large_table WHERE complex_condition;
-- 2. Process in application
-- 3. Quick transaction for update
BEGIN;
UPDATE summary_table SET total = new_value;
COMMIT;

Retry Logic

Build retry mechanisms for deadlock errors

def execute_with_deadlock_retry(transaction_func, max_retries=3):
    for attempt in range(max_retries):
        try:
            return transaction_func()
        except DeadlockError:
            if attempt == max_retries - 1:
                raise
            time.sleep(random.uniform(0.1, 0.5))  # Exponential backoff

Optimize Query Performance

Faster queries hold locks for shorter periods:

-- Create indexes to speed up lookups
CREATE INDEX idx_orders_status_date ON orders(status, created_date);

-- Use covering indexes to avoid key lookups
CREATE INDEX idx_customers_covering ON customers(region) INCLUDE (name, email);

Monitoring Deadlocks

Logging

Enable comprehensive deadlock logging

-- PostgreSQL
ALTER SYSTEM SET log_lock_waits = on;
ALTER SYSTEM SET deadlock_timeout = '1s';

-- MySQL
SET GLOBAL innodb_print_all_deadlocks = ON;

Refer Slow Query Logs

Regularly analyze slow and problematic queries, because most queries prone to deadlocks are the ones that tend to wait for other transactions to complete.

Hence, monitoring slow queries and periodically analyzing them is the best chance to proactively spot potential deadlocks.

-- Find queries that might cause deadlocks
SELECT query, calls, mean_time, stddev_time
FROM pg_stat_statements 
WHERE calls > 100 AND mean_time > 1000
ORDER BY mean_time DESC;

Performance Impact and Trade-offs

Detection Overhead

  • More frequent detection = faster resolution but higher CPU usage
  • Less frequent detection = lower overhead but longer wait times

Lock Granularity

  • Fine-grained locks = better concurrency, but more deadlock potential and high overhead
  • Coarse-grained locks = fewer deadlocks but reduced parallelism

Footnotes

Database deadlocks are an inevitable consequence of concurrent data access in multi-user systems. While they cannot be eliminated, understanding their causes and applying proper prevention, detection, and resolution strategies can minimize their impact on application performance and user experience.

Remember: deadlocks are not bugs to be fixed but a fundamental challenge of concurrent systems. It is therefore our responsibility, and it is honest fun to understand how transactions interact and why they may lead to deadlocks.


If you find this helpful and interesting,

Arpit Bhayani

Staff Engg at GCP Memorystore, Creator of DiceDB, ex-Staff Engg for Google Ads and GCP Dataproc, ex-Amazon Fast Data, ex-Director of Engg. SRE and Data Engineering at Unacademy. I spark engineering curiosity through my no-fluff engineering videos on YouTube and my courses

Writings and Learnings

Blogs

Papershelf

Bookshelf

RSS Feed


Arpit's Newsletter read by 145,000 engineers

Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.


The courses listed on this website are offered by

Relog Deeptech Pvt. Ltd.
203, Sagar Apartment, Camp Road, Mangilal Plot, Amravati, Maharashtra, 444602
GSTIN: 27AALCR5165R1ZF