Redis Replication Internals

One of the most fundamental design decisions in Redis replication is that it’s push-based rather than pull-based. This means the master (or primary) actively sends data to replicas, rather than replicas polling the master for updates.

But why did Redis make this choice? What are the trade-offs? And how does this affect your production systems? Let’s dive deep into the engineering reasoning behind this decision.

Push vs. Pull Replication

Before we explore Redis specifically, let’s establish what we mean by push and pull replication models.

Pull-based replication: Replicas periodically ask the master, “What’s new?” The replica is responsible for fetching updates. Think of it like checking your mailbox; you decide when to check, and you pull out whatever’s there.

Redis uses the push model. When a write command is executed on the master, Redis propagates that command to all connected replicas immediately (or as immediately as the network allows).

Redis Replication

To understand why push makes sense, we need to understand how Redis replication actually works.

The Replication Protocol

When a replica connects to a master, here’s what happens:

The replica sends a PSYNC command to the master
If this is the first sync or the replica is too far behind, the master performs a full resync:
- Master creates an RDB snapshot in the background
- Master buffers all new writes during snapshot creation
- Master sends the RDB file to the replica
- Master sends the buffered writes
After the initial sync, the master enters push mode:
- Every write command executed on the master is immediately sent to all replicas
- This happens via the replication backlog buffer
- Replicas execute these commands in the same order

This is where the push nature becomes evident. The master doesn’t wait for replicas to ask for updates; it actively streams commands as they happen.

Replication Backlog

The replication backlog is a circular buffer that the master maintains. It stores a recent history of write commands (default 1MB, but tunable). This buffer serves two critical purposes:

If a replica disconnects briefly, it can resume from where it left off
Provides a cushion when replicas temporarily fall behind

The backlog itself is a push-oriented data structure. The master continuously appends to it and pushes offsets to replicas, rather than replicas pulling from specific offsets.

Why Push?

Now let’s get to the heart of the matter: why did Redis choose push-based replication?

Minimizing Replication Lag

The primary driver is latency. Redis is designed for microsecond-level operations. In a pull-based model, you’d have unavoidable replication lag due to:

Polling interval: Replicas would need to wait for the next poll cycle
Batching overhead: Multiple writes between polls would bunch up
Request-response latency: Each pull requires a round-trip

With push-based replication, commands propagate to replicas immediately after execution on the master. The only delay is network transmission time. For most use cases, this means replication lag measured in single-digit milliseconds rather than seconds.

Imagine you’re using Redis for session storage in a web application with read replicas. A user logs in (writes to the master), then immediately makes another request that hits a replica. With pull-based replication on a 1-second polling interval, there’s a 50% chance (on average) that the replica doesn’t have the session yet. With push-based replication, the session is likely already there.

Simplified Mental Model and Consistency

Push-based replication creates a clearer consistency model for developers. When you write to Redis, you know:

The write is persisted on the master
The write is immediately propagated to all connected replicas
Replicas apply writes in the exact order they occurred on the master

This is easier to reason about than: “The write is on the master, and replicas will eventually discover it whenever they next poll.”

The push model naturally implements sequential consistency at the replica level. Each replica sees writes in the same order they were executed on the master, which is crucial for maintaining data integrity.

Network Efficiency

Counterintuitively, push can be more network-efficient than pull in many scenarios.

In pull-based systems:

Replicas must regularly poll even when there are no updates (wasted bandwidth)
Each poll is a request-response cycle (protocol overhead)
Batching updates requires additional complexity to handle variable batch sizes

In push-based systems:

Network traffic only occurs when there are actual writes
The master controls the flow, eliminating redundant polls
The protocol is simpler, just stream commands as they arrive

Consider a Redis instance with 1000 writes per second and 10 replicas. In a push model, you send 1000 commands to each replica (10,000 messages). In a pull model with 1-second polling, you’d have 10 polls per second (minimum) plus the 10,000 data messages, and that’s just to match the push latency.

The Single-Threaded Nature of Redis

Redis’s core design is single-threaded (for command execution). This actually makes push-based replication more natural to implement.

Here’s why: When Redis executes a write command on the master, it’s already holding the execution context. At that exact moment, it can:

Execute the command
Immediately propagate it to the replication backlog
Push it to all connected replicas’ output buffers

This happens atomically within the same event loop iteration. There’s no need for background threads, complex synchronization, or state management.

In a pull model, Redis would need to:

Queue incoming replica requests
Maintain per-replica state about what they’ve seen
Respond to each pull request individually
Handle concurrent pulls from multiple replicas

This would introduce significantly more complexity in a single-threaded architecture.

Backpressure and Flow Control

Push-based replication gives the master better control over flow management. The master can:

Monitor each replica’s output buffer
Detect when a replica is falling too far behind
Disconnect slow replicas to protect itself (configurable with repl-timeout and client-output-buffer-limit)

This protects the master from resource exhaustion. If a replica is slow or has network issues, the master can make intelligent decisions about whether to wait or disconnect.

In a pull model, the burden would be on replicas to manage their own rate of consumption, but they wouldn’t have visibility into the master’s state or other replicas’ performance.

Simpler Failure Handling

When things go wrong (and they will), push-based replication offers cleaner failure modes:

Replica disconnection: The master immediately knows via the TCP connection. It can stop trying to push to that replica, freeing resources.

Master failure: Replicas are already in sync up to the last received command. Promotion to master is straightforward, just elect the most up-to-date replica.

Network partition: The master can use min-replicas-to-write and min-replicas-max-lag to refuse writes if too many replicas are unreachable, preventing split-brain scenarios.

These mechanisms are natural in a push model because the master has real-time awareness of the replica state.

Trade-offs

No design is perfect. Push-based replication has limitations:

Master Resource Usage

The master must maintain:

TCP connections to all replicas
Output buffers for each replica
The replication backlog

For a master with many replicas (say, 100+), this can consume significant memory. The client-output-buffer-limit helps here, but it’s a resource consideration nonetheless.

Replica Autonomy

Replicas can’t control their replication rate. They receive data as fast as the master sends it. This is usually fine, but in extreme cases:

A slow replica might build up a backlog in its socket buffer
The replica has no way to tell the master “slow down.”

Redis handles this by disconnecting lagging replicas, which is pragmatic but can be disruptive.

Coordination Overhead for the Master

Every write on the master triggers replication logic:

Append to replication backlog
Push to each replica’s output buffer
Update replication offsets

In a pull model, this cost would be amortized across poll intervals.

Key Tunables

Configuration Tuning

repl-backlog-size: The default 1MB might be too small for high-throughput systems. If replicas frequently disconnect and require full resyncs, increase this.

A good heuristic

(writes_per_second * avg_command_size * expected_disconnect_time * safety_factor)

min-replicas-to-write and min-replicas-max-lag are used to ensure consistency. For example:

min-replicas-to-write 1  
min-replicas-max-lag 10

This means the master refuses writes if it doesn’t have at least 1 replica with less than 10 seconds of lag. This prevents data loss if the master fails right after a write.

Monitoring Replication Lag

We use INFO replication on both master and replicas. Key metrics:

master_repl_offset (on master): How many bytes have been sent
slave_repl_offset (on replica): How many bytes have been processed
Difference = replication lag in bytes

Combine with repl_backlog_first_byte_offset to ensure replicas are within the backlog window.

Read Scaling Patterns

Replicas are usually within milliseconds of the master
Clients can read from nearby replicas for latency-sensitive operations
We can use WAIT command when you need stronger consistency: WAIT 1 1000 blocks until at least 1 replica acknowledges a write (with 1-second timeout)

Remember, even with push-based replication, Redis replication is asynchronous. There’s no distributed consensus. Reads from replicas may return stale data. For critical reads, go to the master or use WAIT.

Diskless Replication

Redis (2.8.18+) introduced diskless replication, which is an optimization enabled by push-based architecture.

Normally, during a full resync:

Master forks and writes RDB to disk
Master reads RDB from disk and sends it to the replica

With diskless replication (repl-diskless-sync yes):

Master forks and writes RDB directly to the replica socket
No disk I/O on the master

Footnotes

Redis replication is push-based because it aligns perfectly with Redis’s design philosophy and constraints.

Minimizes replication lag for the low-latency use cases
Fits naturally into Redis’s single-threaded architecture
Reduces network overhead and unnecessary polling
Gives the master visibility and control over replica state

The push model places more burden on the master and reduces replica autonomy, but for Redis’s primary use cases (caching, session stores, real-time analytics), these trade-offs are absolutely worth it.