TL;DR
- Choose your key (IP, user, token, route, tenant) and policy (hard vs soft, burst allowance) before picking an algorithm.
- Token bucket (or GCRA) gives smooth limits with bursts and is the best default for public APIs.
- Sliding window gives fairer “per‑minute” semantics than fixed windows; log variant is precise, counter is cheaper.
- Implement atomically (Lua/script) in Redis or use NGINX/Envoy/cloud gateways for simple, fast edge enforcement.
- Return
429 Too Many RequestswithRetry-Afterand modern RateLimit‑ headers*. Expose limits per route/tenant. - In multi‑node/region setups, combine local fast limiters with a global budget (hierarchical) and design for clock skew.
1) What are you limiting?
- Dimensions: per IP, user/account, API key, route (e.g.,
/logintighter), tenant, or combinations (tenant:user). - Burst policy: allow short bursts? (token bucket w/ capacity) • strict evenly‑spaced? (leaky bucket/GCRA)
- Hard vs soft: hard = drop at 429; soft = queue (bounded) or degrade work. Most public APIs → hard.
- Separate auth endpoints (login/password reset) from data APIs; much lower thresholds on auth to deter brute force.
2) Algorithms in 60 seconds
| Algorithm | How it works | Pros | Cons | Good for |
|---|---|---|---|---|
| Fixed window (N/min) | Counter per minute resets at boundary | Simple, cheap | Boundary burst (2× near edges), unfair | small internal tools |
| Sliding window counter | Split into current+prev window weighted by time | Cheap, fewer spikes | Approximate, still spiky | most “/min” limits |
| Sliding window log | Store timestamps; evict older than window | Precise fairness | Memory O(requests), heavier | login, abuse‑sensitive |
| Token bucket | Tokens refill at rate; spend to proceed | Bursts allowed, smooth | Needs atomicity | public APIs, mobile |
| Leaky bucket | Queue drains at fixed rate | Smooth, back‑pressure | Adds latency, queue mgmt | background ingestion |
| GCRA | Math version of leaky bucket | Precise, low memory | Trickier to grok | gateways, OSS libs |
Defaults: token bucket (API), sliding counter (cheap per‑minute), log (critical fairness), GCRA (gateways/proxies).
3) Redis patterns (atomic)
A) Token bucket (Lua; 1 script, atomic)
Refill continuously by time delta (capacity C, fill rate R tokens/sec).
-- KEYS[1]=bucket key, ARGV[1]=now_ms, ARGV[2]=rate_per_sec, ARGV[3]=capacity, ARGV[4]=cost (tokens)
-- Returns: allowed (0/1), remaining, retry_after_ms
local key = KEYS[1]
local now = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local cap = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])
local data = redis.call("HMGET", key, "tokens", "ts")
local tokens = tonumber(data[1]) or cap
local ts = tonumber(data[2]) or now
-- refill
local delta = math.max(0, now - ts) / 1000.0
tokens = math.min(cap, tokens + delta * rate)
local allowed = 0
local retry = 0
if tokens >= cost then
tokens = tokens - cost
allowed = 1
else
retry = math.ceil(((cost - tokens) / rate) * 1000)
end
redis.call("HMSET", key, "tokens", tokens, "ts", now)
redis.call("PEXPIRE", key, math.ceil((cap / rate) * 1000)) -- idle expiry ~ full refill time
return {allowed, math.floor(tokens), retry}
Node wrapper (example)
const res = await redis.evalsha(SHA, 1, key, Date.now(), ratePerSec, capacity, 1);
const [allowed, remaining, retryMs] = res.map(Number);
if (!allowed) return tooMany(res, retryMs);
B) Sliding window log (ZSET)
-- KEYS[1]=key, ARGV[1]=now_ms, ARGV[2]=window_ms, ARGV[3]=limit
local key, now, win, limit = KEYS[1], tonumber(ARGV[1]), tonumber(ARGV[2]), tonumber(ARGV[3])
redis.call("ZREMRANGEBYSCORE", key, 0, now - win)
local count = redis.call("ZCARD", key)
if count < limit then
redis.call("ZADD", key, now, tostring(now) .. ":" .. math.random())
redis.call("PEXPIRE", key, win)
return {1, limit - count - 1, 0}
else
local oldest = redis.call("ZRANGE", key, 0, 0, "WITHSCORES")[2]
local retry = (oldest and (oldest - (now - win))) or win
return {0, 0, retry}
end
C) Sliding window counter (approx per minute)
-- KEYS[1]=key, ARGV[1]=now_ms, ARGV[2]=window_ms, ARGV[3]=limit
local k=KEYS[1]; local now=tonumber(ARGV[1]); local win=tonumber(ARGV[2]); local lim=tonumber(ARGV[3])
local cur = math.floor(now / win) * win
local prev = cur - win
local c1 = tonumber(redis.call("GET", k..":"..cur)) or 0
local c0 = tonumber(redis.call("GET", k..":"..prev)) or 0
local weight = (now - cur) / win
local est = c0 * (1 - weight) + c1
if est < lim then
redis.call("INCR", k..":"..cur)
redis.call("PEXPIRE", k..":"..cur, win*2)
return {1, lim - math.ceil(est) - 1, 0}
else
local retry = math.ceil((cur + win - now))
return {0, 0, retry}
end
4) Edge configs (NGINX & Envoy)
NGINX (fixed window + burst)
limit_req_zone $binary_remote_addr zone=perip:10m rate=10r/s;
server {
location /api/ {
limit_req zone=perip burst=20 nodelay; # allow bursts, then 429
add_header Retry-After 1 always;
}
}
Envoy (global service)
# Use external rate limit service (RLS) with descriptors (e.g., per-user, per-route)
rate_limit_service:
grpc_service: { envoy_grpc: { cluster_name: rate_limit_cluster } }
# In the route, define descriptors:
typed_per_filter_config:
envoy.filters.http.ratelimit:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: apigw
stage: 0
request_type: external
rate_limited_as_resource_exhausted: true
descriptors:
- entries:
- key: route
value: users_read
- key: user
value: "%REQ(x-user-id)%"
5) Response headers (be a good citizen)
Prefer the IETF draft/standard RateLimit response fields (supported by many gateways):
RateLimit-Limit: 120;window=60
RateLimit-Remaining: 17
RateLimit-Reset: 13
Retry-After: 13
Also widely used legacy headers:
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 17
X-RateLimit-Reset: 1696346400 # epoch seconds
On 429 Too Many Requests, include Retry-After (seconds or HTTP date) and a machine‑readable error body.
6) Multi‑node & multi‑region
- Single Redis: simplest; beware cross‑region latency.
- Per‑node local limiter + global budget: fast local token bucket with periodic token sync (or deduct from a shared global pool for “write” actions).
- Sticky hashing: route the same key to the same limiter node/shard for consistency.
- Clock skew: base decisions on server time; for multi‑region Redis, use token bucket (less sensitive than exact windows).
- Idempotency keys for write endpoints reduce retries storm impact.
7) Abuse & product considerations
- Combine IP + user for anonymous endpoints; aggressive limits on /login, /signup, /password reset.
- Add CAPTCHA/device fingerprint after soft threshold (tarpitting).
- Per‑tenant scaling: tiers/quotas; rate limit by tenant plan. Burst pooling (token top‑ups) is fine if fair.
- 429 UX: show clear message and time to retry; SDKs should implement exponential backoff + jitter.
8) Observability & tuning
Metrics per key/policy:
allowed_total,limited_total,retry_after_mshistogramremaininggauges (sampled),burst_exhausted_total- P95 latency added by limiter; Redis timeouts/errors
Logs: include key, policy, algorithm, capacity, remaining, retry_after.
Dashboards: top limited keys, hot routes, 429 rate vs traffic, edge vs origin enforcement split.
9) Pitfalls & fast fixes
| Pitfall | Why it hurts | Fix | |---|---|---| | Fixed window only | Boundary burst & unfairness | Use sliding or token bucket | | Non‑atomic updates | Race → burst holes | Use Lua/transactions or proxy limiters | | Global queueing | Latency spikes, memory | Prefer drop or small per‑key queues | | One size for all routes | Abusable endpoints unprotected | Different policies per route | | Ignoring clock skew | Inconsistent limits | Use token bucket; base on server time | | No headers on 429 | Poor client behavior | Send RateLimit‑* + Retry‑After | | Single point Redis | Outage = no traffic | Fallback to local fail‑open/close by route |
Quick checklist
- [ ] Pick keys/policies per route (IP, user, tenant).
- [ ] Choose algorithm: token bucket (default), sliding for “per‑minute”, log for sensitive flows.
- [ ] Implement atomically (Redis Lua) or via NGINX/Envoy/gateway.
- [ ] Emit RateLimit‑* and Retry‑After; return 429.
- [ ] Add metrics/logging; test with load + burst traffic.
- [ ] Plan for multi‑region (local + global budget, sticky hashing).
One‑minute adoption plan
- Set per‑route policies (e.g.,
/login5/min/IP;/api/*120/min/user;/searchtoken bucket 10 r/s with burst 20). - Ship a Redis token bucket (Lua) as your default enforcement.
- Add RateLimit‑* +
Retry-Afterto responses; update SDKs to back off with jitter. - Create dashboards for 429s and top keys; run a burst test in staging.
- For multi‑region, add a local limiter and test fail‑open/close modes.