Redis Distributed Lock: SET NX PX Done Right + Retry Strategies
Distributed locks are deceptively simple. SET key value NX PX 30000 — done, right? Not quite. There are at least five ways to get it wrong. This guide shows you the correct implementation and the pitfalls that trip up even experienced engineers.
The Problem
In a distributed system, multiple processes need to coordinate access to a shared resource. Without a lock:
- Two workers process the same job.
- Two servers update the same database row simultaneously.
- A payment gets charged twice.
The Correct Pattern: SET NX PX
The modern way to acquire a lock in Redis:
SET lock:order:5001 "worker-a-uuid" NX PX 30000
This single command does three things atomically:
NX: Only set if the key does not exist (acquire lock).PX 30000: Auto-expire after 30 seconds (prevent deadlock).- The value is a unique identifier (for safe unlock).
Try It
SET lock:order:5001 "worker-a-uuid" NX PX 30000
SET lock:order:5001 "worker-b-uuid" NX PX 30000
GET lock:order:5001
TTL lock:order:5001
The second SET returns nil — worker B failed to acquire the lock. Worker A holds it.
Why the Value Must Be Unique
The value identifies who holds the lock. Without it, you can accidentally release someone else's lock.
The Dangerous Pattern
Worker A acquires lock (TTL 10s)
Worker A takes 15 seconds to process (lock expires!)
Worker B acquires the now-free lock
Worker A finishes and DELs the lock
Worker B thinks it has the lock, but it was just deleted
Worker C acquires the lock — now B and C both think they have it
The Safe Unlock
Only delete the lock if you still own it. This requires a Lua script for atomicity:
# Pseudocode (Lua script):
if redis.call("GET", key) == my_uuid then
redis.call("DEL", key)
end
In Redis commands, you can verify ownership before deleting:
SET lock:resource:1 "uuid-abc-123" NX PX 30000
GET lock:resource:1
DEL lock:resource:1
But note: the GET + DEL is NOT atomic without Lua. In production, always use a Lua script.
Retry with Exponential Backoff
When a lock acquisition fails, don't spin in a tight loop. Use exponential backoff with jitter:
attempt 1: wait 50ms + random(0, 50)ms
attempt 2: wait 100ms + random(0, 100)ms
attempt 3: wait 200ms + random(0, 200)ms
attempt 4: wait 400ms + random(0, 400)ms
max attempts: 5
Simulating Lock Contention
SET lock:job:process "worker-1" NX PX 10000
SET lock:job:process "worker-2" NX PX 10000
SET lock:job:process "worker-3" NX PX 10000
GET lock:job:process
Workers 2 and 3 get nil. In your application, they should back off and retry.
Lock Extension (Watchdog)
What if your task takes longer than the lock TTL? You need a watchdog that extends the lock while the task is still running.
Pattern
- Acquire lock with TTL 30s.
- Start a background timer that runs every 10s.
- Timer checks if the task is still running. If yes, extend the lock.
- When the task completes, cancel the timer and release the lock.
SET lock:long-task "uuid-abc" NX PX 30000
PEXPIRE lock:long-task 30000
GET lock:long-task
DEL lock:long-task
Libraries like Redisson (Java) implement this watchdog pattern automatically.
Common Mistakes
Mistake 1: Using SETNX + EXPIRE (Two Commands)
SETNX lock:old-pattern "value"
EXPIRE lock:old-pattern 30
If the process crashes between SETNX and EXPIRE, the lock lives forever. Always use the single SET ... NX PX command.
Mistake 2: Fixed Lock Value
SET lock:resource "locked" NX PX 30000
Using a fixed value like "locked" means any process can release the lock. Always use a unique identifier (UUID).
Mistake 3: Too Short TTL
If your lock TTL is 5 seconds but the operation takes 10 seconds, the lock expires mid-operation. Another process acquires it, and you have a race condition.
Rule of thumb: Lock TTL should be at least 3x the expected operation time, plus a watchdog for safety.
Mistake 4: No Retry Limit
Retrying forever can cause cascading failures. Set a maximum retry count and fail gracefully.
Mistake 5: Ignoring Clock Drift
In Redlock (multi-node), clock drift between Redis nodes can cause locks to expire at different times. Account for drift in your TTL calculations.
Redlock: Multi-Node Distributed Lock
For higher reliability, the Redlock algorithm uses multiple independent Redis instances:
- Acquire the lock on N/2+1 out of N instances.
- The lock is valid only if acquired on the majority within a time limit.
- If acquisition fails, release the lock on all instances.
This is more complex and has been debated (see Martin Kleppmann's analysis). For most use cases, a single Redis instance with proper TTL and watchdog is sufficient.
Pitfalls Summary
| Mistake | Consequence | Fix |
|---|---|---|
| SETNX + EXPIRE (two commands) | Deadlock on crash | Use SET NX PX |
| Fixed lock value | Accidental unlock | Use UUID |
| No TTL / too long TTL | Deadlock | Set reasonable TTL |
| Too short TTL | Race condition | TTL ≥ 3x operation time + watchdog |
| No retry limit | Cascading failure | Max retries + backoff |
| DEL without ownership check | Release others' lock | Lua script for atomic check-and-delete |
Try It in the Editor
Head to the Redis Online Editor and practice:
SET lock:order:5001 "worker-a-uuid" NX PX 30000
SET lock:order:5001 "worker-b-uuid" NX PX 30000
GET lock:order:5001
SETNX lock:old "value"
EXPIRE lock:old 30
TTL lock:old
SET lock:safe "uuid-123" NX PX 10000
GET lock:safe
DEL lock:safe
SET lock:safe "uuid-456" NX PX 10000
GET lock:safe
Watch how the second SET NX fails, and how after DEL, a new worker can acquire the lock. This is the core of distributed locking.