caduh

Rate Limiting Strategies — windows, buckets, and distributed guards

6 min read

Fixed/sliding windows, token & leaky buckets, GCRA, and real-world implementations (Redis, NGINX, Envoy). Covers headers, bursts, backoff, multi‑DC, and tuning.

TL;DR

  • Choose your key (IP, user, token, route, tenant) and policy (hard vs soft, burst allowance) before picking an algorithm.
  • Token bucket (or GCRA) gives smooth limits with bursts and is the best default for public APIs.
  • Sliding window gives fairer “per‑minute” semantics than fixed windows; log variant is precise, counter is cheaper.
  • Implement atomically (Lua/script) in Redis or use NGINX/Envoy/cloud gateways for simple, fast edge enforcement.
  • Return 429 Too Many Requests with Retry-After and modern RateLimit‑ headers*. Expose limits per route/tenant.
  • In multi‑node/region setups, combine local fast limiters with a global budget (hierarchical) and design for clock skew.

1) What are you limiting?

  • Dimensions: per IP, user/account, API key, route (e.g., /login tighter), tenant, or combinations (tenant:user).
  • Burst policy: allow short bursts? (token bucket w/ capacity) • strict evenly‑spaced? (leaky bucket/GCRA)
  • Hard vs soft: hard = drop at 429; soft = queue (bounded) or degrade work. Most public APIs → hard.
  • Separate auth endpoints (login/password reset) from data APIs; much lower thresholds on auth to deter brute force.

2) Algorithms in 60 seconds

| Algorithm | How it works | Pros | Cons | Good for | |---|---|---|---|---| | Fixed window (N/min) | Counter per minute resets at boundary | Simple, cheap | Boundary burst (2× near edges), unfair | small internal tools | | Sliding window counter | Split into current+prev window weighted by time | Cheap, fewer spikes | Approximate, still spiky | most “/min” limits | | Sliding window log | Store timestamps; evict older than window | Precise fairness | Memory O(requests), heavier | login, abuse‑sensitive | | Token bucket | Tokens refill at rate; spend to proceed | Bursts allowed, smooth | Needs atomicity | public APIs, mobile | | Leaky bucket | Queue drains at fixed rate | Smooth, back‑pressure | Adds latency, queue mgmt | background ingestion | | GCRA | Math version of leaky bucket | Precise, low memory | Trickier to grok | gateways, OSS libs |

Defaults: token bucket (API), sliding counter (cheap per‑minute), log (critical fairness), GCRA (gateways/proxies).


3) Redis patterns (atomic)

A) Token bucket (Lua; 1 script, atomic)

Refill continuously by time delta (capacity C, fill rate R tokens/sec).

-- KEYS[1]=bucket key, ARGV[1]=now_ms, ARGV[2]=rate_per_sec, ARGV[3]=capacity, ARGV[4]=cost (tokens)
-- Returns: allowed (0/1), remaining, retry_after_ms
local key = KEYS[1]
local now = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local cap = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])

local data = redis.call("HMGET", key, "tokens", "ts")
local tokens = tonumber(data[1]) or cap
local ts = tonumber(data[2]) or now

-- refill
local delta = math.max(0, now - ts) / 1000.0
tokens = math.min(cap, tokens + delta * rate)

local allowed = 0
local retry = 0
if tokens >= cost then
  tokens = tokens - cost
  allowed = 1
else
  retry = math.ceil(((cost - tokens) / rate) * 1000)
end

redis.call("HMSET", key, "tokens", tokens, "ts", now)
redis.call("PEXPIRE", key, math.ceil((cap / rate) * 1000)) -- idle expiry ~ full refill time
return {allowed, math.floor(tokens), retry}

Node wrapper (example)

const res = await redis.evalsha(SHA, 1, key, Date.now(), ratePerSec, capacity, 1);
const [allowed, remaining, retryMs] = res.map(Number);
if (!allowed) return tooMany(res, retryMs);

B) Sliding window log (ZSET)

-- KEYS[1]=key, ARGV[1]=now_ms, ARGV[2]=window_ms, ARGV[3]=limit
local key, now, win, limit = KEYS[1], tonumber(ARGV[1]), tonumber(ARGV[2]), tonumber(ARGV[3])
redis.call("ZREMRANGEBYSCORE", key, 0, now - win)
local count = redis.call("ZCARD", key)
if count < limit then
  redis.call("ZADD", key, now, tostring(now) .. ":" .. math.random())
  redis.call("PEXPIRE", key, win)
  return {1, limit - count - 1, 0}
else
  local oldest = redis.call("ZRANGE", key, 0, 0, "WITHSCORES")[2]
  local retry = (oldest and (oldest - (now - win))) or win
  return {0, 0, retry}
end

C) Sliding window counter (approx per minute)

-- KEYS[1]=key, ARGV[1]=now_ms, ARGV[2]=window_ms, ARGV[3]=limit
local k=KEYS[1]; local now=tonumber(ARGV[1]); local win=tonumber(ARGV[2]); local lim=tonumber(ARGV[3])
local cur = math.floor(now / win) * win
local prev = cur - win
local c1 = tonumber(redis.call("GET", k..":"..cur)) or 0
local c0 = tonumber(redis.call("GET", k..":"..prev)) or 0
local weight = (now - cur) / win
local est = c0 * (1 - weight) + c1
if est < lim then
  redis.call("INCR", k..":"..cur)
  redis.call("PEXPIRE", k..":"..cur, win*2)
  return {1, lim - math.ceil(est) - 1, 0}
else
  local retry = math.ceil((cur + win - now))
  return {0, 0, retry}
end

4) Edge configs (NGINX & Envoy)

NGINX (fixed window + burst)

limit_req_zone $binary_remote_addr zone=perip:10m rate=10r/s;
server {
  location /api/ {
    limit_req zone=perip burst=20 nodelay;   # allow bursts, then 429
    add_header Retry-After 1 always;
  }
}

Envoy (global service)

# Use external rate limit service (RLS) with descriptors (e.g., per-user, per-route)
rate_limit_service:
  grpc_service: { envoy_grpc: { cluster_name: rate_limit_cluster } }
# In the route, define descriptors:
typed_per_filter_config:
  envoy.filters.http.ratelimit:
    "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
    domain: apigw
    stage: 0
    request_type: external
    rate_limited_as_resource_exhausted: true
    descriptors:
      - entries:
          - key: route
            value: users_read
          - key: user
            value: "%REQ(x-user-id)%"

5) Response headers (be a good citizen)

Prefer the IETF draft/standard RateLimit response fields (supported by many gateways):

RateLimit-Limit: 120;window=60
RateLimit-Remaining: 17
RateLimit-Reset: 13
Retry-After: 13

Also widely used legacy headers:

X-RateLimit-Limit: 120
X-RateLimit-Remaining: 17
X-RateLimit-Reset: 1696346400  # epoch seconds

On 429 Too Many Requests, include Retry-After (seconds or HTTP date) and a machine‑readable error body.


6) Multi‑node & multi‑region

  • Single Redis: simplest; beware cross‑region latency.
  • Per‑node local limiter + global budget: fast local token bucket with periodic token sync (or deduct from a shared global pool for “write” actions).
  • Sticky hashing: route the same key to the same limiter node/shard for consistency.
  • Clock skew: base decisions on server time; for multi‑region Redis, use token bucket (less sensitive than exact windows).
  • Idempotency keys for write endpoints reduce retries storm impact.

7) Abuse & product considerations

  • Combine IP + user for anonymous endpoints; aggressive limits on /login, /signup, /password reset.
  • Add CAPTCHA/device fingerprint after soft threshold (tarpitting).
  • Per‑tenant scaling: tiers/quotas; rate limit by tenant plan. Burst pooling (token top‑ups) is fine if fair.
  • 429 UX: show clear message and time to retry; SDKs should implement exponential backoff + jitter.

8) Observability & tuning

Metrics per key/policy:

  • allowed_total, limited_total, retry_after_ms histogram
  • remaining gauges (sampled), burst_exhausted_total
  • P95 latency added by limiter; Redis timeouts/errors

Logs: include key, policy, algorithm, capacity, remaining, retry_after.
Dashboards: top limited keys, hot routes, 429 rate vs traffic, edge vs origin enforcement split.


9) Pitfalls & fast fixes

| Pitfall | Why it hurts | Fix | |---|---|---| | Fixed window only | Boundary burst & unfairness | Use sliding or token bucket | | Non‑atomic updates | Race → burst holes | Use Lua/transactions or proxy limiters | | Global queueing | Latency spikes, memory | Prefer drop or small per‑key queues | | One size for all routes | Abusable endpoints unprotected | Different policies per route | | Ignoring clock skew | Inconsistent limits | Use token bucket; base on server time | | No headers on 429 | Poor client behavior | Send RateLimit‑* + Retry‑After | | Single point Redis | Outage = no traffic | Fallback to local fail‑open/close by route |


Quick checklist

  • [ ] Pick keys/policies per route (IP, user, tenant).
  • [ ] Choose algorithm: token bucket (default), sliding for “per‑minute”, log for sensitive flows.
  • [ ] Implement atomically (Redis Lua) or via NGINX/Envoy/gateway.
  • [ ] Emit RateLimit‑* and Retry‑After; return 429.
  • [ ] Add metrics/logging; test with load + burst traffic.
  • [ ] Plan for multi‑region (local + global budget, sticky hashing).

One‑minute adoption plan

  1. Set per‑route policies (e.g., /login 5/min/IP; /api/* 120/min/user; /search token bucket 10 r/s with burst 20).
  2. Ship a Redis token bucket (Lua) as your default enforcement.
  3. Add RateLimit‑* + Retry-After to responses; update SDKs to back off with jitter.
  4. Create dashboards for 429s and top keys; run a burst test in staging.
  5. For multi‑region, add a local limiter and test fail‑open/close modes.