Redis Cache Penetration, Breakdown & Avalanche: Strategies and Commands
Cache penetration, cache breakdown, and cache avalanche are the three horsemen of caching disasters. They sound similar but have very different causes and solutions. This guide breaks down each problem with concrete Redis commands you can try right now.
Cache Penetration
The Problem
Cache penetration happens when requests query for data that doesn't exist — neither in cache nor in the database. Every request bypasses the cache and hits the database directly.
Example: An attacker sends millions of requests for user:99999999 which doesn't exist. Each request misses the cache and queries the DB.
Solution 1: Cache Null Values
The simplest fix — cache the "not found" result with a short TTL:
SET cache:user:99999999 "NULL" EX 60
GET cache:user:99999999
Your application checks: if the cached value is "NULL", return 404 immediately without hitting the DB.
Solution 2: Bloom Filter
A Bloom filter is a probabilistic data structure that can tell you "definitely not in the set" or "probably in the set."
Before querying Redis or the DB, check the Bloom filter:
BF.ADD users:filter "user:1001"
BF.ADD users:filter "user:1002"
BF.EXISTS users:filter "user:99999999"
BF.EXISTS users:filter "user:1001"
Note:
BF.*commands require the RedisBloom module. Without it, you can implement a simple Bloom filter usingSETBITandGETBIT:
SETBIT bloom:users 42 1
SETBIT bloom:users 87 1
SETBIT bloom:users 156 1
GETBIT bloom:users 42
GETBIT bloom:users 99
Pitfall
Don't set the null-value TTL too long. If the data is later created, users will see stale "not found" responses until the TTL expires.
Cache Breakdown
The Problem
Cache breakdown occurs when a single hot key expires, and hundreds of concurrent requests simultaneously hit the database to rebuild the cache. The DB gets hammered by a thundering herd.
Solution 1: Mutex Lock with SET NX
Use a distributed lock so only one request rebuilds the cache:
SET lock:product:2001 "rebuilding" NX EX 10
Application logic:
- Try to acquire the lock with
SET lock:key NX EX 10. - If acquired (returns OK), query DB, rebuild cache, release lock.
- If not acquired (returns nil), wait briefly and retry reading from cache.
SET lock:product:2001 "rebuilding" NX EX 10
SET cache:product:2001 '{"name":"Widget","price":9.99}' EX 600
DEL lock:product:2001
Solution 2: Logical Expiration
Instead of relying on Redis TTL, store the expiration timestamp inside the value:
SET cache:product:2001 '{"data":{"name":"Widget"},"expire":1739700000}'
GET cache:product:2001
Your application checks the expire field. If expired, it triggers an async rebuild while still serving the stale data. This ensures zero downtime.
Solution 3: Never Expire Hot Keys
For truly critical hot keys, don't set a TTL. Instead, update them proactively via a background job:
SET hotkey:homepage:banner '{"title":"Sale!","img":"banner.jpg"}'
A cron job or message consumer updates this key whenever the source data changes.
Pitfall
The mutex approach can cause request queuing under extreme load. Set a reasonable lock TTL (e.g., 10 seconds) and implement retry with backoff.
Cache Avalanche
The Problem
Cache avalanche happens when a large number of keys expire at the same time, causing a flood of requests to hit the database simultaneously.
Common cause: You load all cache data at startup with the same TTL.
SET cache:product:1 "data1" EX 3600
SET cache:product:2 "data2" EX 3600
SET cache:product:3 "data3" EX 3600
All three expire at exactly the same second. The DB gets slammed.
Solution 1: Add Random Jitter to TTL
Spread out expiration times by adding a random offset:
SET cache:product:1 "data1" EX 3600
SET cache:product:2 "data2" EX 3720
SET cache:product:3 "data3" EX 3540
In your application code: TTL = base_ttl + random(0, 300).
Solution 2: Multi-Level Caching
Use a local in-memory cache (L1) in front of Redis (L2):
Request → L1 (in-process) → L2 (Redis) → Database
Even if Redis cache expires, the L1 cache absorbs the spike. L1 typically has a shorter TTL (e.g., 30 seconds).
Solution 3: Circuit Breaker
When the DB error rate spikes, trip the circuit breaker and return degraded responses (stale cache, default values, or error pages) instead of letting all requests through.
Solution 4: Pre-warm Cache
Before a known traffic spike (e.g., a sale event), pre-load critical data:
SET cache:product:1 '{"name":"Widget","price":4.99}' EX 7200
SET cache:product:2 '{"name":"Gadget","price":14.99}' EX 7320
SET cache:product:3 '{"name":"Doohickey","price":2.99}' EX 7140
Pitfall
Random jitter alone isn't enough if your base TTL is too short. If all keys have TTL between 50–60 seconds, they'll still expire in a tight window. Use a meaningful base TTL (e.g., 1 hour) with jitter of 5–10 minutes.
Quick Comparison
| Problem | Cause | Key Symptom | Primary Fix |
|---|---|---|---|
| Penetration | Query non-existent data | Every request hits DB | Bloom filter / cache null |
| Breakdown | Hot key expires | Thundering herd on one key | Mutex lock / logical expiry |
| Avalanche | Mass expiration | DB overload spike | TTL jitter / multi-level cache |
Try It in the Editor
Head to the Redis Online Editor and simulate these scenarios:
SET cache:user:99999999 "NULL" EX 60
GET cache:user:99999999
TTL cache:user:99999999
SET lock:product:2001 "rebuilding" NX EX 10
SET lock:product:2001 "rebuilding" NX EX 10
SET cache:product:1 "data1" EX 3600
SET cache:product:2 "data2" EX 3720
SET cache:product:3 "data3" EX 3540
TTL cache:product:1
TTL cache:product:2
TTL cache:product:3
Try the second SET NX — it returns nil because the lock is already held. That's the mutex pattern in action.