Database deadlocks are one of the most challenging concurrency issues encountered in production systems. Understanding why they occur and how databases handle them is important in building robust applications.
Let’s dig deeper and explore the mechanics of deadlocks, their underlying causes, detection mechanisms, and potential resolution strategies.
Database Deadlocks
A deadlock occurs when two or more transactions are waiting indefinitely for each other to release locks, creating a circular dependency that prevents any of the transactions from proceeding.
In database terms, a deadlock happens when:
- Txn A holds a lock on resource X and waits for a lock on resource Y
- Txn B holds a lock on resource Y and waits for a lock on resource X
Neither transaction can proceed, creating an infinite wait condition, making them wait forever.
The Classic Example
Consider this scenario with two transactions executing simultaneously
-- Transaction 1
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
-- Transaction 2 (executing concurrently)
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;
UPDATE accounts SET balance = balance + 50 WHERE id = 1;
COMMIT;
Txn 1 locks account 1 and waits for account 2, while Txn 2 locks account 2 and waits for account 1. This creates a circular wait condition - a classic example of a deadlock.
Why Deadlocks Occur
Lock Ordering Issues
The most common cause of deadlocks is inconsistent lock ordering across transactions. When different transactions acquire locks on the same resources in different orders, circular dependencies become inevitable with sufficient concurrency.
Real-world example: In an e-commerce system, one transaction might update inventory first, then the user account, while another updates the user account first, then inventory. Under high load, these transactions will eventually deadlock.
Transaction Design and Scope
Long-running transactions increase deadlock probability by holding locks for extended periods. The longer a transaction runs, the more likely it is to conflict with other transactions.
Problematic pattern
BEGIN;
-- Heavy computation or external API call
SELECT * FROM products WHERE category = 'electronics';
-- ... application logic taking 30 seconds ...
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 123;
COMMIT;
Index and Query Plan Variations
Database optimizers can choose different execution plans for similar queries, leading to different lock acquisition orders. This is particularly problematic with range queries and complex joins.
-- These queries might acquire locks in different orders
-- depending on available indexes and statistics
--- This being written demonstrates the order of lock acq
--- the query would be exactly the same in both cases.
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.region = 'US' AND o.status = 'pending';
SELECT * FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.status = 'pending' AND c.region = 'US';
Lock Escalation
Lock escalation—where databases convert many fine-grained locks into fewer coarse-grained locks for memory efficiency—can create unexpected deadlock scenarios. A row-level deadlock might escalate to a table-level deadlock, affecting entirely different transactions.
Foreign Key Constraints
Foreign key relationships can create hidden lock dependencies. When you update a parent table, the database may need to check or lock related child table records, creating unexpected lock chains.
Deadlock Detection
Databases employ sophisticated algorithms to detect deadlocks automatically. The most common approach is the wait-for-graph algorithm.
Wait-for Graph Algorithm
The database maintains a directed graph where
- Nodes represent active transactions
- Edges represent wait relationships (Transaction A waits for Transaction B)
- A cycle in the graph indicates a deadlock
Detection process
- Periodically (typically every few seconds), the database examines all waiting transactions
- Constructs a wait-for graph based on current lock requests and holdings
- Searches for cycles using graph traversal algorithms
- When a cycle is found, identify it as a deadlock
Some databases also do deadlock prevention, which means all the above steps, but right before granting a lock to a transaction. The lock manager typically kills the transaction that asked for the lock that could have led to the deadlock.
Detection Frequency and Performance
Deadlock detection isn’t free; it requires CPU cycles and can impact performance. Databases balance detection frequency with system overhead
- PostgreSQL: Checks for deadlocks every 1 second by default (
deadlock_timeout
) - MySQL: Uses immediate deadlock detection for some scenarios, periodic checking for others
- SQL Server: Runs deadlock detection every 5 seconds, with more frequent checks under high contention
Deadlock Resolution Strategies
When a deadlock is detected via a periodic check, databases must break the cycle by terminating one or more transactions. This process involves victim selection and recovery mechanisms.
Victim Selection Algorithms
Databases use various criteria to choose which transaction to abort
Transaction That Started Detection
- The transaction that started deadlock detection (usually newer) is the one that gets killed.
Work-based Selection
- Abort the transaction that has done the least work
- Minimizes the amount of rollback required
- Measured by number of log records, CPU time, or rows modified
Time-based Selection
- Abort the newest transaction (assuming older transactions are more important)
- Or abort the oldest transaction (to prevent starvation)
Priority-based Selection
- Some databases allow setting transaction priorities
- Lower-priority transactions are chosen as victims
- Useful for distinguishing between critical and background operations
Rollback Cost Estimation
- Calculate the cost of rolling back each transaction
- Consider factors like transaction size, complexity, and resource usage
- Choose the transaction with the lowest rollback cost
Database-Specific Deadlock Handling
PostgreSQL
PostgreSQL prefers lazy detection i.e., it checks for deadlocks after a timeout period deadlock_timeout
(default 1s) and at that time builds a full wait-for graph.
-- PostgreSQL deadlock detection configuration
SET deadlock_timeout = '1s'; -- Wait before starting detection
SET log_lock_waits = on; -- Log long lock waits
Thus, we see that PostgreSQL runs with an assumption that most lock waits resolve quickly, so it doesn’t waste CPU on immediate checks.
Advantages:
- Lower CPU overhead during normal operation (lazy detection)
- Better performance when lock waits are typically short
- More predictable behavior under high concurrency
Disadvantages:
- Minimum 1-second delay before deadlock detection
- Can lead to longer overall wait times in deadlock scenarios
Example
-- Session 1
SET deadlock_timeout = '5s';
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- Holds exclusive lock on account 1, waits here:
-- Now run the next command but don't hit enter until Session 2 is waiting
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- Session 2 (runs concurrently)
SET deadlock_timeout = '5s';
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;
-- Holds exclusive lock on account 2, waits here:
-- This will block, then Session 1 will also block, then PostgreSQL will detect deadlock
UPDATE accounts SET balance = balance + 50 WHERE id = 1;
-- PostgreSQL behavior:
-- 1. Session 2 waits for 1 second (deadlock_timeout)
-- 2. Builds wait-for graph: Session1 → Session2 → Session1 (cycle!)
-- 3. Kills Session 2 with error: "deadlock detected"
PostgreSQL Output
ERROR: deadlock detected
DETAIL: Process 93 waits for ShareLock on transaction 745; blocked by process 86.
Process 86 waits for ShareLock on transaction 746; blocked by process 93.
MySQL (InnoDB) Approach
MySQL uses a more aggressive approach where, for a simple two-transaction deadlock, detection is instantaneous, but for complex scenarios, it runs detection every few seconds (~5s) i.e., it falls back to timeout-based resolution (innodb_lock_wait_timeout
)
SET innodb_lock_wait_timeout = 50; -- Max wait time (seconds)
SET innodb_deadlock_detect = ON; -- Enable deadlock detection
SET innodb_print_all_deadlocks = ON; -- Log all deadlocks
Thus, MySQL runs with the assumption that deadlocks should be resolved as quickly as possible
Advantages:
- Immediate detection for simple deadlocks
- Faster resolution when deadlocks occur
- Better for applications that can’t tolerate long waits
Disadvantages:
- Higher CPU overhead from constant monitoring
- A more complex detection algorithm can have edge cases
Example
-- Same scenario in MySQL
-- Session 1
START TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- Session 2
START TRANSACTION;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;
UPDATE accounts SET balance = balance + 50 WHERE id = 1;
-- MySQL behavior:
-- 1. Immediately detects a simple deadlock pattern
-- 2. Chooses the victim based on row modification count
-- 3. Rolls back the transaction with fewer changes
MySQL Output
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Prevention Strategies
Consistent Lock Ordering
Establish and enforce consistent ordering of resource access across all transactions:
-- Bad: Inconsistent ordering
UPDATE table_a WHERE id = 1;
UPDATE table_b WHERE id = 2;
-- Good: Consistent ordering (always access table_a before table_b)
UPDATE table_a WHERE id = 1;
UPDATE table_b WHERE id = 2;
Minimize Transaction Scope
Keep transactions as short as possible:
-- Bad: Long-running transaction
BEGIN;
SELECT * FROM large_table WHERE complex_condition;
-- ... extensive application processing ...
UPDATE summary_table SET total = new_value;
COMMIT;
-- Good: Separate read and write phases
-- 1. Read data outside the transaction
SELECT * FROM large_table WHERE complex_condition;
-- 2. Process in application
-- 3. Quick transaction for update
BEGIN;
UPDATE summary_table SET total = new_value;
COMMIT;
Retry Logic
Build retry mechanisms for deadlock errors
def execute_with_deadlock_retry(transaction_func, max_retries=3):
for attempt in range(max_retries):
try:
return transaction_func()
except DeadlockError:
if attempt == max_retries - 1:
raise
time.sleep(random.uniform(0.1, 0.5)) # Exponential backoff
Optimize Query Performance
Faster queries hold locks for shorter periods:
-- Create indexes to speed up lookups
CREATE INDEX idx_orders_status_date ON orders(status, created_date);
-- Use covering indexes to avoid key lookups
CREATE INDEX idx_customers_covering ON customers(region) INCLUDE (name, email);
Monitoring Deadlocks
Logging
Enable comprehensive deadlock logging
-- PostgreSQL
ALTER SYSTEM SET log_lock_waits = on;
ALTER SYSTEM SET deadlock_timeout = '1s';
-- MySQL
SET GLOBAL innodb_print_all_deadlocks = ON;
Refer Slow Query Logs
Regularly analyze slow and problematic queries, because most queries prone to deadlocks are the ones that tend to wait for other transactions to complete.
Hence, monitoring slow queries and periodically analyzing them is the best chance to proactively spot potential deadlocks.
-- Find queries that might cause deadlocks
SELECT query, calls, mean_time, stddev_time
FROM pg_stat_statements
WHERE calls > 100 AND mean_time > 1000
ORDER BY mean_time DESC;
Performance Impact and Trade-offs
Detection Overhead
- More frequent detection = faster resolution but higher CPU usage
- Less frequent detection = lower overhead but longer wait times
Lock Granularity
- Fine-grained locks = better concurrency, but more deadlock potential and high overhead
- Coarse-grained locks = fewer deadlocks but reduced parallelism
Footnotes
Database deadlocks are an inevitable consequence of concurrent data access in multi-user systems. While they cannot be eliminated, understanding their causes and applying proper prevention, detection, and resolution strategies can minimize their impact on application performance and user experience.
Remember: deadlocks are not bugs to be fixed but a fundamental challenge of concurrent systems. It is therefore our responsibility, and it is honest fun to understand how transactions interact and why they may lead to deadlocks.