MySQL Replication Internals

MySQL replication enables data synchronization across databases, powering read scaling and even some complex distributed architectures. At the core lies the binary log (binlog), which is the authoritative record of all data modifications on a source server.

Getting a hang of how binlog replication works is not only fun, but it also builds an intuition as to how other databases would be doing it. More importantly, if you face any replication related issues in production, you know what could have gone wrong.

This essay covers MySQL binlog replication internals along with looking at how events are structured, how threads handle replication, the different modes, and tips for keeping it running smoothly at scale. So, here we go…

Why Binlog Replication Matters

MySQL’s binary log keeps a record of every change to the database, essentially serving as the source of truth for what happened and when. Because of this, it supports several key features:

First, it allows read scaling by replicating data to replica servers that can handle read queries. Second, it provides high availability by maintaining standby servers ready to take over if the primary fails. Third, it enables point-in-time recovery by replaying binary logs to restore a database to any moment in time. Fourth, it supports change data capture (CDC) by allowing external systems to consume database changes in real time.

Binary Log

The binary log is a sequence of files that record changes to the database. Each file has a base name (configurable via the log_bin option) followed by a numeric suffix that increments with each new file. An index file tracks all current binary log files.

mysql-bin.000001
mysql-bin.000002
mysql-bin.000003
mysql-bin.index

Binary log files roll over either when they reach a certain size (controlled by max_binlog_size, usually 1GB) or whenever the server restarts. The index file is just a plain text list pointing to all the current log files.

Each binary log file begins with a magic number (0xfe626963, which spells “binlog” in ASCII) followed by a format description event that defines how subsequent events should be interpreted. This format description contains versioning information that allows different MySQL versions to understand each other’s binary logs during replication.

Binary Log Event Structure

Every change recorded in the binary log is stored as an event. Events have a 19-byte header followed by event-specific data. Something like this…

Timestamp (4B): when the event occurred
Type code (1B): the event type
Server ID (4B): id of the server that generated the event
Event length (4B): size of the event, including the header
Next position (4B): Position of the next event in the binlog
Flags (2B): Various flags controlling event behavior

+=====================================+
| event  | timestamp         0 : 4   |
| header +----------------------------+
|        | type_code         4 : 1   |
|        +----------------------------+
|        | server_id         5 : 4   |
|        +----------------------------+
|        | event_length      9 : 4   |
|        +----------------------------+
|        | next_position    13 : 4   |
|        +----------------------------+
|        | flags            17 : 2   |
+=====================================+
| event  | fixed part       19 : y   |
| data   +----------------------------+
|        | variable part             |
+=====================================+

Common Event Types

MySQL uses several event types, but some important ones are …

FORMAT_DESCRIPTION_EVENT is the first event in every binary log file. It describes the server version and provides metadata needed to interpret all subsequent events.
QUERY_EVENT contains SQL statements that were executed on the source. The event includes the database context, the SQL statement text, and various execution metadata.
WRITE_ROWS_EVENT, UPDATE_ROWS_EVENT, and DELETE_ROWS_EVENT contain the actual row data for inserts, updates, and deletes in row-based replication. Update events include both before and after images of modified rows.
GTID_LOG_EVENT contains the global transaction identifier for GTID-based replication. This event appears at the start of each transaction and contains the unique identifier along with logical timestamps used for parallel replication.
XID_EVENT marks the end of a transaction that was committed. The XID (transaction ID) links the binary log entry to the storage engine’s internal transaction tracking.
ROTATE_EVENT indicates that the server is switching to a new binary log file. This event is appended as the last event before rotating to a new file.

Binary Log Formats

MySQL supports three binary logging formats, each with different tradeoffs for space efficiency, safety, and compatibility.

Statement-based Logging

Statement-based logging (SBR) records the actual SQL statements that were executed. This format is compact because a single statement like UPDATE users SET status = 'active' WHERE created_at < '2024-01-01' is stored as one event regardless of how many rows it affects.

-- This single statement might update millions of rows
-- but is stored as one compact event in SBR
UPDATE users SET status = 'active' WHERE created_at < '2024-01-01';

However, SBR has limitations. Non-deterministic functions like NOW(), UUID(), or RAND() can produce different results when replayed on a replica. Statements using LIMIT without ORDER BY are also problematic because row ordering may differ between servers.

Row-based Logging

Row-based logging (RBR) records the actual row changes rather than the statements that produced them. Each modified row is stored with its before and after images for updates, or just the relevant image for inserts and deletes.

### INSERT INTO test.users
### SET
###   @1=1
###   @2='john_doe'
###   @3='2024-01-15'

### UPDATE test.users
### WHERE
###   @1=1
###   @2='john_doe'
###   @3='2024-01-15'
### SET
###   @1=1
###   @2='john_doe_updated'
###   @3='2024-01-15'

RBR guarantees that replicas receive exactly the same row changes that occurred on the source. The tradeoff is larger binary logs when statements affect many rows. A single UPDATE touching a million rows generates a million row events.

Mixed Logging

Mixed logging (MBR) combines both approaches. MySQL uses statement-based logging by default but automatically switches to row-based logging for statements that would be unsafe to replicate as statements. This balances compactness and safety.

By the way, row-based logging is the default in MySQL 8.0 and is recommended for most production workloads due to its reliability and deterministic behavior.

Replication Architecture

MySQL replication uses an asynchronous, pull-based architecture. The replica initiates connections to the source and requests binary log events, which are then applied locally. This makes replicas to operate independently and recover gracefully from network interruptions.

On the source server:

Binary log: Stores all changes as events
Binlog dump thread: Sends binary log events to connected replicas

On the replica server:

I/O thread (receiver thread): Connects to the source and receives events
Relay log: Local storage for received events
SQL thread (applier thread): Reads relay log and applies events
Worker threads: Apply events in parallel when multi-threaded replication is enabled

Binlog Dump Thread

When a replica connects to a source for replication, the source creates a dedicated binlog dump thread to serve that replica. This thread is responsible for reading events from the binary log and sending them over the network.

The dump thread maintains the binary log position requested by the replica and sends events as they become available. When the source is idle with no new changes, the dump thread waits (not busy-wait) for new events to be written to the binary log.

By the way, you can observe binlog dump threads on the source server using SHOW PROCESSLIST

mysql> SHOW PROCESSLIST\G
*************************** 1. row ***************************
     Id: 42
   User: repl_user
   Host: 192.168.1.100:54321
     db: NULL
Command: Binlog Dump
   Time: 3600
  State: Master has sent all binlog to slave; waiting for more updates
   Info: NULL

Each connected replica gets its own binlog dump thread. If you have five replicas, the source maintains five dump threads. This is generally lightweight because the threads spend most of their time waiting for new events.

The I/O thread (receiver thread)

The replica’s I/O thread is responsible for connecting to the source server and receiving binary log events. When you start replication with START REPLICA, the I/O thread sets up a connection to the source specified in the CHANGE REPLICATION SOURCE TO command.

Once connected, the I/O thread requests binary log events starting from the position recorded in the replica’s connection metadata. As events arrive, the thread writes them to the local relay log without processing them. This separation of receiving and applying events is an important design decision.

First, it allows the replica to buffer events locally even when the applier is slow or stopped. Second, it enables the replica to catch up quickly after being offline because the I/O thread can pull events as fast as the network allows. Third, it provides a checkpoint mechanism where the relay log position can be used to resume replication after interruptions.

The Relay Log

The relay log (which is on a replica) is structurally identical to the binary log. It consists of numbered files containing the same event format as binary logs. The key difference is that relay logs contain events received from a source server rather than events generated locally.

relay-bin.000001
relay-bin.000002
relay-bin.index

Relay log files are created as the I/O thread receives events from the source. The SQL thread reads from relay logs to apply changes. Once events have been applied, relay log files are automatically deleted to free disk space.

The relay log serves as a buffer between network reception and the local application. This buffering is crucial for handling situations where the applier falls behind the receiver. Without relay logs, a slow application would create backpressure all the way to the source server.

The SQL Thread (applier)

The SQL thread reads events from the relay log and applies them to the local database. In single-threaded mode, this thread processes events sequentially in the exact order they appear in the relay log.

For each event, the SQL thread performs appropriate actions:

QUERY_EVENT: Executes the SQL statement
WRITE_ROWS_EVENT: Inserts the specified rows
UPDATE_ROWS_EVENT: Updates rows matching the before-image to the after-image
DELETE_ROWS_EVENT: Deletes rows matching the specified image

After successfully applying an event, the SQL thread updates the applier metadata to record its progress. This metadata includes the relay log position and the corresponding source binary log position.

The SQL thread is often the bottleneck in replication because it must apply changes sequentially in single-threaded mode. Even if the source executed many transactions concurrently, the replica applies them one at a time. This is why multi-threaded replication was introduced.

Multi-threaded Replication (MTS)

With multi-threaded replication, a replica can handle multiple transactions at once, because of several worker threads. This really helps when the source is busy with lots of concurrent activity.

Enable multi-threaded replication by setting replica_parallel_workers (or slave_parallel_workers in older versions):

SET GLOBAL replica_parallel_workers = 4;

When MTS is enabled, the SQL thread becomes a coordinator that reads transactions from the relay log and assigns them to worker threads for execution. The coordinator ensures proper ordering while maximizing parallelism.

MySQL 5.7 introduced logical clock parallelism. The logical clock approach recognizes that transactions committed together in the same group commit on the source are independent and can be applied in parallel. This is based on a key insight: if transactions could commit together on the source without conflicts, they can be applied in parallel on the replica.

The source server adds logical timestamps to each transaction in the binary log. These timestamps indicate the “commit parent” of each transaction, essentially marking which transactions were in the same commit group. The replica coordinator uses these timestamps to schedule parallel execution.

-- Configure logical clock parallelism (default in MySQL 8.0)
SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK';

MySQL 8 introduced WRITESET-based parallelism, which analyzes the actual rows modified by each transaction. Transactions modifying non-overlapping rows can execute in parallel regardless of commit timing. This provides the highest level of parallelism but requires additional memory for tracking.

SET GLOBAL binlog_transaction_dependency_tracking = 'WRITESET';
SET GLOBAL transaction_write_set_extraction = 'XXHASH64';

Semi-synchronous replication

By default, MySQL replication is fully asynchronous. The source commits transactions without waiting for any replica to receive them. This provides the best performance but creates a window where committed transactions could be lost if the source crashes before replicas receive them.

Semi-synchronous replication makes the source pause until at least one replica confirms it got the transaction. So, it gives you a better safety for your data, though it can slow things down a bit.

The semi-synchronous flow works as follows:

Client sends COMMIT to source
Source writes transaction to binary log
Source waits for replica acknowledgment
Replica receives transaction and writes to relay log
Replica sends ACK back to source
Source commits in storage engine and returns to client

Semi-synchronous replication includes a timeout mechanism. If no replica acknowledges within the timeout (default 10 seconds), the source falls back to asynchronous replication to prevent blocking indefinitely:

SET GLOBAL rpl_semi_sync_source_timeout = 10000;  -- 10 seconds

You can also require acknowledgment from multiple replicas:

-- Wait for 2 replicas to acknowledge
SET GLOBAL rpl_semi_sync_source_wait_for_replica_count = 2;

Replication filters

MySQL supports filtering which databases, tables, or events are replicated. Filters can be applied on either the source or replica side.

Source-side filtering controls what is written to the binary log:

-- Only log changes to specific databases
binlog_do_db = production
binlog_do_db = analytics

-- Ignore specific databases
binlog_ignore_db = test
binlog_ignore_db = staging

Replica-side filtering controls what is applied from the relay log:

-- Only apply changes to specific databases
replicate_do_db = production

-- Ignore specific tables
replicate_ignore_table = production.audit_log

-- Use wildcards for flexible matching
replicate_wild_do_table = production.%
replicate_wild_ignore_table = %.temp_%

Note, if a statement references multiple databases and filtering is in effect, results may be unexpected. Row-based replication with replica-side filtering is generally safer than statement-based replication.

Monitoring Replication Lag

Replication lag is the delay between when a transaction commits on the source and when it is applied on the replica. Some lag is inherent in asynchronous replication, but excessive lag can cause consistency issues for applications reading from replicas.

The traditional method checks Seconds_Behind_Master in SHOW REPLICA STATUS:

mysql> SHOW REPLICA STATUS\G
*************************** 1. row ***************************
        Replica_IO_Running: Yes
       Replica_SQL_Running: Yes
         Seconds_Behind_Master: 15

This metric compares timestamps between the relay log event currently being executed and the most recently received event.

Best Practices for Replication in Production

Deploy replicas with equal or greater resources than the source. A replica with fewer CPU cores, less memory, or slower disks will inevitably lag.

Use row-based replication for safety and predictability. While statement-based logging is more compact, the risk of non-deterministic behavior outweighs the space savings.

Configure crash-safe replication with table-based metadata repositories. This prevents inconsistencies after replica crashes.

Use semi-synchronous replication when durability is critical. Accept the latency overhead to guarantee that committed transactions exist on at least one replica.

Monitor replication lag continuously and alert when it exceeds acceptable thresholds. Integrate lag monitoring into your application health checks.

Keep primary keys on all replicated tables. This is the single most impactful optimization for row-based replication performance.

Test failover procedures regularly. Replication is only valuable for high availability if you can actually promote a replica when needed.

Replication Topology Patterns

MySQL replication supports various topology patterns beyond simple source-to-replica setups.

Chain replication connects replicas in sequence: A replicates to B, B replicates to C. This reduces load on the primary but increases lag for downstream replicas. Enable log_replica_updates on intermediate servers so they write received transactions to their own binary logs.

# Enable on replica B so it can act as source for replica C
[mysqld]
log_replica_updates = ON
log_bin = mysql-bin

Multi-source replication allows a single replica to receive changes from multiple sources. Each source connection is managed through a replication channel:

CHANGE REPLICATION SOURCE TO
  SOURCE_HOST = 'source1.example.com',
  SOURCE_USER = 'repl',
  SOURCE_PASSWORD = 'password'
  FOR CHANNEL 'source1';

CHANGE REPLICATION SOURCE TO
  SOURCE_HOST = 'source2.example.com',
  SOURCE_USER = 'repl',
  SOURCE_PASSWORD = 'password'
  FOR CHANNEL 'source2';

This pattern is useful for aggregating data from multiple shards into a central reporting database. Each channel has independent I/O and SQL threads.

Binary Log Disk Management

Binary logs can eat up a lot of disk space on busy servers. It’s important to make sure you don’t run out of room, while still keeping enough history for recovery or setting up new replicas.

By the way, we can purge old binary logs based on time or space with the following options:

-- Remove logs older than 7 days
SET GLOBAL binlog_expire_logs_seconds = 604800;

-- Or use the older days-based option
SET GLOBAL expire_logs_days = 7;

Manual purging can target specific files:

-- Remove all logs before this file
PURGE BINARY LOGS TO 'mysql-bin.000050';

-- Remove logs older than a specific date
PURGE BINARY LOGS BEFORE '2024-01-15 00:00:00';

We can also monitor binary log disk usage:

SHOW BINARY LOGS;  -- Lists all binary logs with sizes
SHOW MASTER STATUS;  -- Current binary log file and position

Footnote

MySQL binlog replication has a pull-based architecture where replicas request events from sources, storing them in relay logs before applying.

The replication supports statement and row-based logging, with GTIDs providing transaction tracking and multi-threaded appliers enabling parallel execution. Semi-synchronous replication offers stronger durability guarantees when needed.

The binary log isn’t only about replication, rather, it is also the ‘backbone’ for things like disaster recovery, real-time data capture, and broader database architecture.

Replication workflow is different in different databases, and nuances are what make them special.