TL;DR
- Put CDN/edge in front of anything public. It offloads traffic and slashes tail latency.
- Use distributed cache (Redis/Memcached) for data shared across instances.
- Add in-memory (per-process) caches for ultra-hot, tiny objects and computed results.
- Start with cache-aside + TTL, then add request coalescing and stale-while-revalidate to tame stampedes.
Why cache at all?
Caching trades freshness for speed & scale.
- Lower latency: serve from memory/edge (micro- to milliseconds).
- Higher throughput: fewer DB and origin hits.
- Resilience: withstand traffic spikes and partial outages.
- Lower cost: less egress, fewer DB reads, smaller fleets.
The layers (mental map)
- Browser cache: honors
Cache-Control,ETag,Last-Modified. - CDN/Edge (reverse proxy): Cloudflare/Fastly/Akamai; caches full responses at POPs.
- App-side caches:
- In-memory (process-local): fastest, not shared across instances.
- Distributed (Redis/Memcached): shared, network hop required.
- DB/query cache: results/materialized views; often downstream of app caches.
Core strategies
1) In-memory (process-local)
Use for: super-hot config, small computed results, feature flags, rate-limited calls.
Pros: nanosecond lookup, zero network.
Cons: not shared; stale until process restarts; risk of duplication across instances.
Example (LRU + TTL, Node/TS):
import LRU from "lru-cache";
const cache = new LRU<string, any>({ max: 5_000, ttl: 60_000 }); // 1 min
export async function getUserFast(id: string) {
const key = `user:${id}`;
let val = cache.get(key);
if (val) return val;
// cache-aside
val = await db.users.findById(id);
cache.set(key, val);
return val;
}