MySQL replication enables data synchronization across databases, powering read scaling and even some complex distributed architectures. At the core lies the binary log (binlog), which is the authoritative record of all data modifications on a source server.
Getting a hang of how binlog replication works is not only fun, but it also builds an intuition as to how other databases would be doing it. More importantly, if you face any replication related issues in production, you know what could have gone wrong.
This essay covers MySQL binlog replication internals along with looking at how events are structured, how threads handle replication, the different modes, and tips for keeping it running smoothly at scale. So, here we go…
Why Binlog Replication Matters
MySQL’s binary log keeps a record of every change to the database, essentially serving as the source of truth for what happened and when. Because of this, it supports several key features:
First, it allows read scaling by replicating data to replica servers that can handle read queries. Second, it provides high availability by maintaining standby servers ready to take over if the primary fails. Third, it enables point-in-time recovery by replaying binary logs to restore a database to any moment in time. Fourth, it supports change data capture (CDC) by allowing external systems to consume database changes in real time.
Binary Log
The binary log is a sequence of files that record changes to the database. Each file has a base name (configurable via the log_bin option) followed by a numeric suffix that increments with each new file. An index file tracks all current binary log files.
mysql-bin.000001
mysql-bin.000002
mysql-bin.000003
mysql-bin.index
Binary log files roll over either when they reach a certain size (controlled by max_binlog_size, usually 1GB) or whenever the server restarts. The index file is just a plain text list pointing to all the current log files.
Each binary log file begins with a magic number (0xfe626963, which spells “binlog” in ASCII) followed by a format description event that defines how subsequent events should be interpreted. This format description contains versioning information that allows different MySQL versions to understand each other’s binary logs during replication.
Binary Log Event Structure
Every change recorded in the binary log is stored as an event. Events have a 19-byte header followed by event-specific data. Something like this…
- Timestamp (4B): when the event occurred
- Type code (1B): the event type
- Server ID (4B): id of the server that generated the event
- Event length (4B): size of the event, including the header
- Next position (4B): Position of the next event in the binlog
- Flags (2B): Various flags controlling event behavior
+=====================================+
| event | timestamp 0 : 4 |
| header +----------------------------+
| | type_code 4 : 1 |
| +----------------------------+
| | server_id 5 : 4 |
| +----------------------------+
| | event_length 9 : 4 |
| +----------------------------+
| | next_position 13 : 4 |
| +----------------------------+
| | flags 17 : 2 |
+=====================================+
| event | fixed part 19 : y |
| data +----------------------------+
| | variable part |
+=====================================+
Common Event Types
MySQL uses several event types, but some important ones are …
FORMAT_DESCRIPTION_EVENTis the first event in every binary log file. It describes the server version and provides metadata needed to interpret all subsequent events.QUERY_EVENTcontains SQL statements that were executed on the source. The event includes the database context, the SQL statement text, and various execution metadata.WRITE_ROWS_EVENT,UPDATE_ROWS_EVENT, andDELETE_ROWS_EVENTcontain the actual row data for inserts, updates, and deletes in row-based replication. Update events include both before and after images of modified rows.GTID_LOG_EVENTcontains the global transaction identifier for GTID-based replication. This event appears at the start of each transaction and contains the unique identifier along with logical timestamps used for parallel replication.XID_EVENTmarks the end of a transaction that was committed. TheXID(transaction ID) links the binary log entry to the storage engine’s internal transaction tracking.ROTATE_EVENTindicates that the server is switching to a new binary log file. This event is appended as the last event before rotating to a new file.
Binary Log Formats
MySQL supports three binary logging formats, each with different tradeoffs for space efficiency, safety, and compatibility.
Statement-based Logging
Statement-based logging (SBR) records the actual SQL statements that were executed. This format is compact because a single statement like UPDATE users SET status = 'active' WHERE created_at < '2024-01-01' is stored as one event regardless of how many rows it affects.
-- This single statement might update millions of rows
-- but is stored as one compact event in SBR
UPDATE users SET status = 'active' WHERE created_at < '2024-01-01';
However, SBR has limitations. Non-deterministic functions like NOW(), UUID(), or RAND() can produce different results when replayed on a replica. Statements using LIMIT without ORDER BY are also problematic because row ordering may differ between servers.
Row-based Logging
Row-based logging (RBR) records the actual row changes rather than the statements that produced them. Each modified row is stored with its before and after images for updates, or just the relevant image for inserts and deletes.
### INSERT INTO test.users
### SET
### @1=1
### @2='john_doe'
### @3='2024-01-15'
### UPDATE test.users
### WHERE
### @1=1
### @2='john_doe'
### @3='2024-01-15'
### SET
### @1=1
### @2='john_doe_updated'
### @3='2024-01-15'
RBR guarantees that replicas receive exactly the same row changes that occurred on the source. The tradeoff is larger binary logs when statements affect many rows. A single UPDATE touching a million rows generates a million row events.
Mixed Logging
Mixed logging (MBR) combines both approaches. MySQL uses statement-based logging by default but automatically switches to row-based logging for statements that would be unsafe to replicate as statements. This balances compactness and safety.
By the way, row-based logging is the default in MySQL 8.0 and is recommended for most production workloads due to its reliability and deterministic behavior.
Replication Architecture
MySQL replication uses an asynchronous, pull-based architecture. The replica initiates connections to the source and requests binary log events, which are then applied locally. This makes replicas to operate independently and recover gracefully from network interruptions.
On the source server:
- Binary log: Stores all changes as events
- Binlog dump thread: Sends binary log events to connected replicas
On the replica server:
- I/O thread (receiver thread): Connects to the source and receives events
- Relay log: Local storage for received events
- SQL thread (applier thread): Reads relay log and applies events
- Worker threads: Apply events in parallel when multi-threaded replication is enabled
Binlog Dump Thread
When a replica connects to a source for replication, the source creates a dedicated binlog dump thread to serve that replica. This thread is responsible for reading events from the binary log and sending them over the network.
The dump thread maintains the binary log position requested by the replica and sends events as they become available. When the source is idle with no new changes, the dump thread waits (not busy-wait) for new events to be written to the binary log.
By the way, you can observe binlog dump threads on the source server using SHOW PROCESSLIST
mysql> SHOW PROCESSLIST\G
*************************** 1. row ***************************
Id: 42
User: repl_user
Host: 192.168.1.100:54321
db: NULL
Command: Binlog Dump
Time: 3600
State: Master has sent all binlog to slave; waiting for more updates
Info: NULL
Each connected replica gets its own binlog dump thread. If you have five replicas, the source maintains five dump threads. This is generally lightweight because the threads spend most of their time waiting for new events.
The I/O thread (receiver thread)
The replica’s I/O thread is responsible for connecting to the source server and receiving binary log events. When you start replication with START REPLICA, the I/O thread sets up a connection to the source specified in the CHANGE REPLICATION SOURCE TO command.
Once connected, the I/O thread requests binary log events starting from the position recorded in the replica’s connection metadata. As events arrive, the thread writes them to the local relay log without processing them. This separation of receiving and applying events is an important design decision.
First, it allows the replica to buffer events locally even when the applier is slow or stopped. Second, it enables the replica to catch up quickly after being offline because the I/O thread can pull events as fast as the network allows. Third, it provides a checkpoint mechanism where the relay log position can be used to resume replication after interruptions.
The Relay Log
The relay log (which is on a replica) is structurally identical to the binary log. It consists of numbered files containing the same event format as binary logs. The key difference is that relay logs contain events received from a source server rather than events generated locally.
relay-bin.000001
relay-bin.000002
relay-bin.index
Relay log files are created as the I/O thread receives events from the source. The SQL thread reads from relay logs to apply changes. Once events have been applied, relay log files are automatically deleted to free disk space.
The relay log serves as a buffer between network reception and the local application. This buffering is crucial for handling situations where the applier falls behind the receiver. Without relay logs, a slow application would create backpressure all the way to the source server.
The SQL Thread (applier)
The SQL thread reads events from the relay log and applies them to the local database. In single-threaded mode, this thread processes events sequentially in the exact order they appear in the relay log.
For each event, the SQL thread performs appropriate actions:
QUERY_EVENT: Executes the SQL statementWRITE_ROWS_EVENT: Inserts the specified rowsUPDATE_ROWS_EVENT: Updates rows matching the before-image to the after-imageDELETE_ROWS_EVENT: Deletes rows matching the specified image
After successfully applying an event, the SQL thread updates the applier metadata to record its progress. This metadata includes the relay log position and the corresponding source binary log position.
The SQL thread is often the bottleneck in replication because it must apply changes sequentially in single-threaded mode. Even if the source executed many transactions concurrently, the replica applies them one at a time. This is why multi-threaded replication was introduced.
Multi-threaded Replication (MTS)
With multi-threaded replication, a replica can handle multiple transactions at once, because of several worker threads. This really helps when the source is busy with lots of concurrent activity.
Enable multi-threaded replication by setting replica_parallel_workers (or slave_parallel_workers in older versions):
SET GLOBAL replica_parallel_workers = 4;
When MTS is enabled, the SQL thread becomes a coordinator that reads transactions from the relay log and assigns them to worker threads for execution. The coordinator ensures proper ordering while maximizing parallelism.
MySQL 5.7 introduced logical clock parallelism. The logical clock approach recognizes that transactions committed together in the same group commit on the source are independent and can be applied in parallel. This is based on a key insight: if transactions could commit together on the source without conflicts, they can be applied in parallel on the replica.
The source server adds logical timestamps to each transaction in the binary log. These timestamps indicate the “commit parent” of each transaction, essentially marking which transactions were in the same commit group. The replica coordinator uses these timestamps to schedule parallel execution.
-- Configure logical clock parallelism (default in MySQL 8.0)
SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK';
MySQL 8 introduced WRITESET-based parallelism, which analyzes the actual rows modified by each transaction. Transactions modifying non-overlapping rows can execute in parallel regardless of commit timing. This provides the highest level of parallelism but requires additional memory for tracking.
SET GLOBAL binlog_transaction_dependency_tracking = 'WRITESET';
SET GLOBAL transaction_write_set_extraction = 'XXHASH64';
Semi-synchronous replication
By default, MySQL replication is fully asynchronous. The source commits transactions without waiting for any replica to receive them. This provides the best performance but creates a window where committed transactions could be lost if the source crashes before replicas receive them.
Semi-synchronous replication makes the source pause until at least one replica confirms it got the transaction. So, it gives you a better safety for your data, though it can slow things down a bit.
The semi-synchronous flow works as follows:
- Client sends
COMMITto source - Source writes transaction to binary log
- Source waits for replica acknowledgment
- Replica receives transaction and writes to relay log
- Replica sends
ACKback to source - Source commits in storage engine and returns to client
Semi-synchronous replication includes a timeout mechanism. If no replica acknowledges within the timeout (default 10 seconds), the source falls back to asynchronous replication to prevent blocking indefinitely:
SET GLOBAL rpl_semi_sync_source_timeout = 10000; -- 10 seconds
You can also require acknowledgment from multiple replicas:
-- Wait for 2 replicas to acknowledge
SET GLOBAL rpl_semi_sync_source_wait_for_replica_count = 2;
Replication filters
MySQL supports filtering which databases, tables, or events are replicated. Filters can be applied on either the source or replica side.
Source-side filtering controls what is written to the binary log:
-- Only log changes to specific databases
binlog_do_db = production
binlog_do_db = analytics
-- Ignore specific databases
binlog_ignore_db = test
binlog_ignore_db = staging
Replica-side filtering controls what is applied from the relay log:
-- Only apply changes to specific databases
replicate_do_db = production
-- Ignore specific tables
replicate_ignore_table = production.audit_log
-- Use wildcards for flexible matching
replicate_wild_do_table = production.%
replicate_wild_ignore_table = %.temp_%
Note, if a statement references multiple databases and filtering is in effect, results may be unexpected. Row-based replication with replica-side filtering is generally safer than statement-based replication.
Monitoring Replication Lag
Replication lag is the delay between when a transaction commits on the source and when it is applied on the replica. Some lag is inherent in asynchronous replication, but excessive lag can cause consistency issues for applications reading from replicas.
The traditional method checks Seconds_Behind_Master in SHOW REPLICA STATUS:
mysql> SHOW REPLICA STATUS\G
*************************** 1. row ***************************
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
Seconds_Behind_Master: 15
This metric compares timestamps between the relay log event currently being executed and the most recently received event.
Best Practices for Replication in Production
Deploy replicas with equal or greater resources than the source. A replica with fewer CPU cores, less memory, or slower disks will inevitably lag.
Use row-based replication for safety and predictability. While statement-based logging is more compact, the risk of non-deterministic behavior outweighs the space savings.
Configure crash-safe replication with table-based metadata repositories. This prevents inconsistencies after replica crashes.
Use semi-synchronous replication when durability is critical. Accept the latency overhead to guarantee that committed transactions exist on at least one replica.
Monitor replication lag continuously and alert when it exceeds acceptable thresholds. Integrate lag monitoring into your application health checks.
Keep primary keys on all replicated tables. This is the single most impactful optimization for row-based replication performance.
Test failover procedures regularly. Replication is only valuable for high availability if you can actually promote a replica when needed.
Replication Topology Patterns
MySQL replication supports various topology patterns beyond simple source-to-replica setups.
Chain replication connects replicas in sequence: A replicates to B, B replicates to C. This reduces load on the primary but increases lag for downstream replicas. Enable log_replica_updates on intermediate servers so they write received transactions to their own binary logs.
# Enable on replica B so it can act as source for replica C
[mysqld]
log_replica_updates = ON
log_bin = mysql-bin
Multi-source replication allows a single replica to receive changes from multiple sources. Each source connection is managed through a replication channel:
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = 'source1.example.com',
SOURCE_USER = 'repl',
SOURCE_PASSWORD = 'password'
FOR CHANNEL 'source1';
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = 'source2.example.com',
SOURCE_USER = 'repl',
SOURCE_PASSWORD = 'password'
FOR CHANNEL 'source2';
This pattern is useful for aggregating data from multiple shards into a central reporting database. Each channel has independent I/O and SQL threads.
Binary Log Disk Management
Binary logs can eat up a lot of disk space on busy servers. It’s important to make sure you don’t run out of room, while still keeping enough history for recovery or setting up new replicas.
By the way, we can purge old binary logs based on time or space with the following options:
-- Remove logs older than 7 days
SET GLOBAL binlog_expire_logs_seconds = 604800;
-- Or use the older days-based option
SET GLOBAL expire_logs_days = 7;
Manual purging can target specific files:
-- Remove all logs before this file
PURGE BINARY LOGS TO 'mysql-bin.000050';
-- Remove logs older than a specific date
PURGE BINARY LOGS BEFORE '2024-01-15 00:00:00';
We can also monitor binary log disk usage:
SHOW BINARY LOGS; -- Lists all binary logs with sizes
SHOW MASTER STATUS; -- Current binary log file and position
Footnote
MySQL binlog replication has a pull-based architecture where replicas request events from sources, storing them in relay logs before applying.
The replication supports statement and row-based logging, with GTIDs providing transaction tracking and multi-threaded appliers enabling parallel execution. Semi-synchronous replication offers stronger durability guarantees when needed.
The binary log isn’t only about replication, rather, it is also the ‘backbone’ for things like disaster recovery, real-time data capture, and broader database architecture.
Replication workflow is different in different databases, and nuances are what make them special.