One of the most fundamental design decisions in Redis replication is that it’s push-based rather than pull-based. This means the master (or primary) actively sends data to replicas, rather than replicas polling the master for updates.
But why did Redis make this choice? What are the trade-offs? And how does this affect your production systems? Let’s dive deep into the engineering reasoning behind this decision.
Push vs. Pull Replication
Before we explore Redis specifically, let’s establish what we mean by push and pull replication models.
Pull-based replication: Replicas periodically ask the master, “What’s new?” The replica is responsible for fetching updates. Think of it like checking your mailbox; you decide when to check, and you pull out whatever’s there.
Redis uses the push model. When a write command is executed on the master, Redis propagates that command to all connected replicas immediately (or as immediately as the network allows).
Redis Replication
To understand why push makes sense, we need to understand how Redis replication actually works.
The Replication Protocol
When a replica connects to a master, here’s what happens:
- The replica sends a
PSYNC
command to the master - If this is the first sync or the replica is too far behind, the master performs a full resync:
- Master creates an RDB snapshot in the background
- Master buffers all new writes during snapshot creation
- Master sends the RDB file to the replica
- Master sends the buffered writes
- After the initial sync, the master enters push mode:
- Every write command executed on the master is immediately sent to all replicas
- This happens via the replication backlog buffer
- Replicas execute these commands in the same order
This is where the push nature becomes evident. The master doesn’t wait for replicas to ask for updates; it actively streams commands as they happen.
Replication Backlog
The replication backlog is a circular buffer that the master maintains. It stores a recent history of write commands (default 1MB, but tunable). This buffer serves two critical purposes:
- If a replica disconnects briefly, it can resume from where it left off
- Provides a cushion when replicas temporarily fall behind
The backlog itself is a push-oriented data structure. The master continuously appends to it and pushes offsets to replicas, rather than replicas pulling from specific offsets.
Why Push?
Now let’s get to the heart of the matter: why did Redis choose push-based replication?
Minimizing Replication Lag
The primary driver is latency. Redis is designed for microsecond-level operations. In a pull-based model, you’d have unavoidable replication lag due to:
- Polling interval: Replicas would need to wait for the next poll cycle
- Batching overhead: Multiple writes between polls would bunch up
- Request-response latency: Each pull requires a round-trip
With push-based replication, commands propagate to replicas immediately after execution on the master. The only delay is network transmission time. For most use cases, this means replication lag measured in single-digit milliseconds rather than seconds.
Imagine you’re using Redis for session storage in a web application with read replicas. A user logs in (writes to the master), then immediately makes another request that hits a replica. With pull-based replication on a 1-second polling interval, there’s a 50% chance (on average) that the replica doesn’t have the session yet. With push-based replication, the session is likely already there.
Simplified Mental Model and Consistency
Push-based replication creates a clearer consistency model for developers. When you write to Redis, you know:
- The write is persisted on the master
- The write is immediately propagated to all connected replicas
- Replicas apply writes in the exact order they occurred on the master
This is easier to reason about than: “The write is on the master, and replicas will eventually discover it whenever they next poll.”
The push model naturally implements sequential consistency at the replica level. Each replica sees writes in the same order they were executed on the master, which is crucial for maintaining data integrity.
Network Efficiency
Counterintuitively, push can be more network-efficient than pull in many scenarios.
In pull-based systems:
- Replicas must regularly poll even when there are no updates (wasted bandwidth)
- Each poll is a request-response cycle (protocol overhead)
- Batching updates requires additional complexity to handle variable batch sizes
In push-based systems:
- Network traffic only occurs when there are actual writes
- The master controls the flow, eliminating redundant polls
- The protocol is simpler, just stream commands as they arrive
Consider a Redis instance with 1000 writes per second and 10 replicas. In a push model, you send 1000 commands to each replica (10,000 messages). In a pull model with 1-second polling, you’d have 10 polls per second (minimum) plus the 10,000 data messages, and that’s just to match the push latency.
The Single-Threaded Nature of Redis
Redis’s core design is single-threaded (for command execution). This actually makes push-based replication more natural to implement.
Here’s why: When Redis executes a write command on the master, it’s already holding the execution context. At that exact moment, it can:
- Execute the command
- Immediately propagate it to the replication backlog
- Push it to all connected replicas’ output buffers
This happens atomically within the same event loop iteration. There’s no need for background threads, complex synchronization, or state management.
In a pull model, Redis would need to:
- Queue incoming replica requests
- Maintain per-replica state about what they’ve seen
- Respond to each pull request individually
- Handle concurrent pulls from multiple replicas
This would introduce significantly more complexity in a single-threaded architecture.
Backpressure and Flow Control
Push-based replication gives the master better control over flow management. The master can:
- Monitor each replica’s output buffer
- Detect when a replica is falling too far behind
- Disconnect slow replicas to protect itself (configurable with
repl-timeout
andclient-output-buffer-limit
)
This protects the master from resource exhaustion. If a replica is slow or has network issues, the master can make intelligent decisions about whether to wait or disconnect.
In a pull model, the burden would be on replicas to manage their own rate of consumption, but they wouldn’t have visibility into the master’s state or other replicas’ performance.
Simpler Failure Handling
When things go wrong (and they will), push-based replication offers cleaner failure modes:
Replica disconnection: The master immediately knows via the TCP connection. It can stop trying to push to that replica, freeing resources.
Master failure: Replicas are already in sync up to the last received command. Promotion to master is straightforward, just elect the most up-to-date replica.
Network partition: The master can use min-replicas-to-write
and min-replicas-max-lag
to refuse writes if too many replicas are unreachable, preventing split-brain scenarios.
These mechanisms are natural in a push model because the master has real-time awareness of the replica state.
Trade-offs
No design is perfect. Push-based replication has limitations:
Master Resource Usage
The master must maintain:
- TCP connections to all replicas
- Output buffers for each replica
- The replication backlog
For a master with many replicas (say, 100+), this can consume significant memory. The client-output-buffer-limit
helps here, but it’s a resource consideration nonetheless.
Replica Autonomy
Replicas can’t control their replication rate. They receive data as fast as the master sends it. This is usually fine, but in extreme cases:
- A slow replica might build up a backlog in its socket buffer
- The replica has no way to tell the master “slow down.”
Redis handles this by disconnecting lagging replicas, which is pragmatic but can be disruptive.
Coordination Overhead for the Master
Every write on the master triggers replication logic:
- Append to replication backlog
- Push to each replica’s output buffer
- Update replication offsets
In a pull model, this cost would be amortized across poll intervals.
Key Tunables
Configuration Tuning
repl-backlog-size
: The default 1MB might be too small for high-throughput systems. If replicas frequently disconnect and require full resyncs, increase this.
A good heuristic
(writes_per_second * avg_command_size * expected_disconnect_time * safety_factor)
min-replicas-to-write
and min-replicas-max-lag
are used to ensure consistency. For example:
min-replicas-to-write 1
min-replicas-max-lag 10
This means the master refuses writes if it doesn’t have at least 1 replica with less than 10 seconds of lag. This prevents data loss if the master fails right after a write.
Monitoring Replication Lag
We use INFO replication
on both master and replicas. Key metrics:
master_repl_offset
(on master): How many bytes have been sentslave_repl_offset
(on replica): How many bytes have been processed- Difference = replication lag in bytes
Combine with repl_backlog_first_byte_offset
to ensure replicas are within the backlog window.
Read Scaling Patterns
- Replicas are usually within milliseconds of the master
- Clients can read from nearby replicas for latency-sensitive operations
- We can use
WAIT
command when you need stronger consistency:WAIT 1 1000
blocks until at least 1 replica acknowledges a write (with 1-second timeout)
Remember, even with push-based replication, Redis replication is asynchronous. There’s no distributed consensus. Reads from replicas may return stale data. For critical reads, go to the master or use WAIT
.
Diskless Replication
Redis (2.8.18+) introduced diskless replication, which is an optimization enabled by push-based architecture.
Normally, during a full resync:
- Master forks and writes RDB to disk
- Master reads RDB from disk and sends it to the replica
With diskless replication (repl-diskless-sync yes
):
- Master forks and writes RDB directly to the replica socket
- No disk I/O on the master
Footnotes
Redis replication is push-based because it aligns perfectly with Redis’s design philosophy and constraints.
- Minimizes replication lag for the low-latency use cases
- Fits naturally into Redis’s single-threaded architecture
- Reduces network overhead and unnecessary polling
- Gives the master visibility and control over replica state
The push model places more burden on the master and reduces replica autonomy, but for Redis’s primary use cases (caching, session stores, real-time analytics), these trade-offs are absolutely worth it.