Redis Distributed Lock: SET NX PX Done Right + Retry Strategies

Distributed locks are deceptively simple. SET key value NX PX 30000 — done, right? Not quite. There are at least five ways to get it wrong. This guide shows you the correct implementation and the pitfalls that trip up even experienced engineers.

The Problem

In a distributed system, multiple processes need to coordinate access to a shared resource. Without a lock:

  • Two workers process the same job.
  • Two servers update the same database row simultaneously.
  • A payment gets charged twice.

The Correct Pattern: SET NX PX

The modern way to acquire a lock in Redis:

SET lock:order:5001 "worker-a-uuid" NX PX 30000

This single command does three things atomically:

  • NX: Only set if the key does not exist (acquire lock).
  • PX 30000: Auto-expire after 30 seconds (prevent deadlock).
  • The value is a unique identifier (for safe unlock).

Try It

SET lock:order:5001 "worker-a-uuid" NX PX 30000
SET lock:order:5001 "worker-b-uuid" NX PX 30000
GET lock:order:5001
TTL lock:order:5001

The second SET returns nil — worker B failed to acquire the lock. Worker A holds it.

Why the Value Must Be Unique

The value identifies who holds the lock. Without it, you can accidentally release someone else's lock.

The Dangerous Pattern

Worker A acquires lock (TTL 10s)
Worker A takes 15 seconds to process (lock expires!)
Worker B acquires the now-free lock
Worker A finishes and DELs the lock
Worker B thinks it has the lock, but it was just deleted
Worker C acquires the lock — now B and C both think they have it

The Safe Unlock

Only delete the lock if you still own it. This requires a Lua script for atomicity:

# Pseudocode (Lua script):
if redis.call("GET", key) == my_uuid then
    redis.call("DEL", key)
end

In Redis commands, you can verify ownership before deleting:

SET lock:resource:1 "uuid-abc-123" NX PX 30000
GET lock:resource:1
DEL lock:resource:1

But note: the GET + DEL is NOT atomic without Lua. In production, always use a Lua script.

Retry with Exponential Backoff

When a lock acquisition fails, don't spin in a tight loop. Use exponential backoff with jitter:

attempt 1: wait 50ms + random(0, 50)ms
attempt 2: wait 100ms + random(0, 100)ms
attempt 3: wait 200ms + random(0, 200)ms
attempt 4: wait 400ms + random(0, 400)ms
max attempts: 5

Simulating Lock Contention

SET lock:job:process "worker-1" NX PX 10000
SET lock:job:process "worker-2" NX PX 10000
SET lock:job:process "worker-3" NX PX 10000
GET lock:job:process

Workers 2 and 3 get nil. In your application, they should back off and retry.

Lock Extension (Watchdog)

What if your task takes longer than the lock TTL? You need a watchdog that extends the lock while the task is still running.

Pattern

  1. Acquire lock with TTL 30s.
  2. Start a background timer that runs every 10s.
  3. Timer checks if the task is still running. If yes, extend the lock.
  4. When the task completes, cancel the timer and release the lock.
SET lock:long-task "uuid-abc" NX PX 30000

PEXPIRE lock:long-task 30000

GET lock:long-task
DEL lock:long-task

Libraries like Redisson (Java) implement this watchdog pattern automatically.

Common Mistakes

Mistake 1: Using SETNX + EXPIRE (Two Commands)

SETNX lock:old-pattern "value"
EXPIRE lock:old-pattern 30

If the process crashes between SETNX and EXPIRE, the lock lives forever. Always use the single SET ... NX PX command.

Mistake 2: Fixed Lock Value

SET lock:resource "locked" NX PX 30000

Using a fixed value like "locked" means any process can release the lock. Always use a unique identifier (UUID).

Mistake 3: Too Short TTL

If your lock TTL is 5 seconds but the operation takes 10 seconds, the lock expires mid-operation. Another process acquires it, and you have a race condition.

Rule of thumb: Lock TTL should be at least 3x the expected operation time, plus a watchdog for safety.

Mistake 4: No Retry Limit

Retrying forever can cause cascading failures. Set a maximum retry count and fail gracefully.

Mistake 5: Ignoring Clock Drift

In Redlock (multi-node), clock drift between Redis nodes can cause locks to expire at different times. Account for drift in your TTL calculations.

Redlock: Multi-Node Distributed Lock

For higher reliability, the Redlock algorithm uses multiple independent Redis instances:

  1. Acquire the lock on N/2+1 out of N instances.
  2. The lock is valid only if acquired on the majority within a time limit.
  3. If acquisition fails, release the lock on all instances.

This is more complex and has been debated (see Martin Kleppmann's analysis). For most use cases, a single Redis instance with proper TTL and watchdog is sufficient.

Pitfalls Summary

MistakeConsequenceFix
SETNX + EXPIRE (two commands)Deadlock on crashUse SET NX PX
Fixed lock valueAccidental unlockUse UUID
No TTL / too long TTLDeadlockSet reasonable TTL
Too short TTLRace conditionTTL ≥ 3x operation time + watchdog
No retry limitCascading failureMax retries + backoff
DEL without ownership checkRelease others' lockLua script for atomic check-and-delete

Try It in the Editor

Head to the Redis Online Editor and practice:

SET lock:order:5001 "worker-a-uuid" NX PX 30000
SET lock:order:5001 "worker-b-uuid" NX PX 30000
GET lock:order:5001

SETNX lock:old "value"
EXPIRE lock:old 30
TTL lock:old

SET lock:safe "uuid-123" NX PX 10000
GET lock:safe
DEL lock:safe
SET lock:safe "uuid-456" NX PX 10000
GET lock:safe

Watch how the second SET NX fails, and how after DEL, a new worker can acquire the lock. This is the core of distributed locking.