TL;DR
- Indexes are sorted structures that let the DB jump to rows instead of scanning tables. Start with B‑tree.
- Design composite indexes to match your most selective WHERE prefix and ORDER BY.
- Use covering (include/select) columns to avoid table hits; use partial and expression indexes to target hot queries.
- Don’t index everything: each index adds write cost, disk, and planning complexity. Measure with
EXPLAIN (ANALYZE).
1) What an index buys you (and what it doesn’t)
- Buys: fast lookups and ordered scans. Improves
WHERE,JOIN ON, andORDER BYthat align with the index. - Doesn’t: make bad predicates fast (
%term%), fix huge result sets, or help if the planner chooses a seq scan because selectivity is low.
Default types
- B‑tree (default): equality/inequality, prefix
LIKE 'foo%', sort & range. - GIN/GiST (Postgres): arrays, JSON, full‑text, geo.
- Hash (engine‑specific): equality only.
- BRIN: giant, naturally ordered tables (time/ID ranges).
2) Composite & covering indexes (the 80/20)
Rule: Order columns in the index by (1) equality filters, most selective first; then (2) frequent range/ORDER BY.
-- Postgres
CREATE INDEX idx_orders_user_created ON orders (tenant_id, user_id, created_at DESC);
- Good for:
WHERE tenant_id=$1 AND user_id=$2 ORDER BY created_at DESC LIMIT 20 - Works for:
WHERE tenant_id=$1 AND user_id=$2 AND created_at > $3 - Not for:
WHERE created_at > $3(leftmost prefix rule).
Covering (avoid table hit)
-- Postgres INCLUDE
CREATE INDEX idx_orders_cover
ON orders (tenant_id, user_id, created_at DESC)
INCLUDE (status, total_amount);
-- MySQL: put small selected columns at the end of the index definition.
A covering index lets the engine answer from the index pages alone (index‑only scan).
3) Partial & expression indexes (surgical speedups)
Partial (filter hot slice)
-- Only open orders—smaller, cheaper, faster
CREATE INDEX CONCURRENTLY idx_open_orders
ON orders (tenant_id, created_at DESC)
WHERE status = 'open';
Expression / function index
-- Case-insensitive email lookup
CREATE INDEX idx_users_email_lower ON users (lower(email));
-- Then query must use the same expression:
SELECT * FROM users WHERE lower(email) = lower($1);
If your predicate wraps a column (e.g.,
LOWER(col),date_trunc), a plain index oncolwon’t be used—create an expression index.
4) Patterns you’ll use weekly
Keyset pagination (fast, stable)
-- After first page (last_seen_created_at, id)
SELECT id, created_at
FROM posts
WHERE (created_at, id) < ($1, $2)
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Index: (created_at DESC, id DESC)
Prefix search (autocomplete)
SELECT id FROM users WHERE email ILIKE 'ali%';
-- Index: (email) -- works for 'prefix%', not for '%middle%'
For arbitrary substring/typo search: Postgres pg_trgm (GIN + gin_trgm_ops) or an external search engine.
Multi‑tenant recent activity
CREATE INDEX idx_events_tenant_time ON events (tenant_id, occurred_at DESC);
SELECT * FROM events
WHERE tenant_id=$1 AND occurred_at > now() - interval '7 days'
ORDER BY occurred_at DESC
LIMIT 100;
Top‑N per group (use index order)
-- Get latest invoice per account
SELECT DISTINCT ON (account_id) account_id, id, created_at
FROM invoices
ORDER BY account_id, created_at DESC;
-- Index: (account_id, created_at DESC)
JSON properties (Postgres)
-- Exact match
CREATE INDEX idx_props_color ON products ((props->>'color'));
SELECT * FROM products WHERE props->>'color' = 'blue';
-- Many keys/contains: use GIN
CREATE INDEX idx_props_gin ON products USING GIN (props jsonb_path_ops);
SELECT * FROM products WHERE props @> '{"color":"blue"}';
5) Reading EXPLAIN (ANALYZE) without tears
- Seq Scan: reading the whole table—okay for tiny tables or low selectivity.
- Index Scan: great for selective predicates; returns rows in index order.
- Bitmap Index Scan + Heap: good when many hits; batches lookups.
- Rows/Actual Rows: planner estimates vs reality—big mismatches mean stale stats or skew.
- Filter / Rows Removed by Filter: if high, your predicate doesn’t match the index order.
- I/O vs CPU: check timing; if I/O bound, aim for covering or more selective predicates.
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE tenant_id=$1 AND user_id=$2 ORDER BY created_at DESC LIMIT 20;
6) When not to index
- Columns with very low cardinality (e.g., boolean flags) unless combined with others.
- Small tables (seq scan wins).
- Columns that change constantly (high write/update cost).
- “Just in case” indexes—check usage stats first, then delete if unused.
7) Maintenance & safety
- Keep stats fresh: Postgres autovacuum/autoanalyze; MySQL ANALYZE TABLE.
- Build indexes online where possible (
CONCURRENTLY,ONLINE/ALGORITHM=INPLACE). - Replace safely: create new index, switch queries, then drop the old (also concurrently).
- Watch for bloat after big deletes; plan REINDEX in off‑peak windows.
- Monitor: cache hit ratio, index/seq scan ratio, slow query logs.
Pitfalls & fast fixes
| Pitfall | Why it hurts | Fix |
|---|---|---|
| Wrong column order in composite | Planner can’t use leftmost rule | Put equality & most selective first |
| Functions on columns in WHERE | Plain index is ignored | Use expression index and match predicate |
| Leading wildcard (%term) | B‑tree can’t help | Use trigram/full‑text or search service |
| Sorting different from index | Extra sort step | Align ORDER BY with index order (ASC/DESC) |
| Too many similar indexes | Slower writes, more disk | Consolidate into a single composite/covering |
| Stale stats | Bad plans | Run analyze; tune autovacuum thresholds |
Quick checklist
- [ ] Index your WHERE + JOIN + ORDER BY (in that priority).
- [ ] Design composite indexes with correct left‑prefix.
- [ ] Use covering to skip table reads for hot paths.
- [ ] Add partial/expression indexes for targeted speedups.
- [ ] Verify with
EXPLAIN (ANALYZE)and monitor real usage. - [ ] Periodically prune unused/overlapping indexes.
One‑minute adoption plan
- List your top 5 slow queries from logs.
- For each, sketch the index that matches its filters + order.
- Create online; compare before/after with
EXPLAIN (ANALYZE). - Consolidate overlapping indexes; drop unused ones.
- Add a CI guard that runs
EXPLAINon changed queries (or uses a plan linter).