Database Indexing 101 — faster queries, fewer surprises

September 12th, 20254 min read#dev #databases #sql #postgres #mysql #performance #ca-duh

A practical mental model for indexes, with copy‑paste patterns for B‑tree, composite order, covering, partial, and expression indexes—plus when **not** to index.

TL;DR

Indexes are sorted structures that let the DB jump to rows instead of scanning tables. Start with B‑tree.
Design composite indexes to match your most selective WHERE prefix and ORDER BY.
Use covering (include/select) columns to avoid table hits; use partial and expression indexes to target hot queries.
Don’t index everything: each index adds write cost, disk, and planning complexity. Measure with EXPLAIN (ANALYZE).

1) What an index buys you (and what it doesn’t)

Buys: fast lookups and ordered scans. Improves WHERE, JOIN ON, and ORDER BY that align with the index.
Doesn’t: make bad predicates fast (%term%), fix huge result sets, or help if the planner chooses a seq scan because selectivity is low.

Default types

B‑tree (default): equality/inequality, prefix LIKE 'foo%', sort & range.
GIN/GiST (Postgres): arrays, JSON, full‑text, geo.
Hash (engine‑specific): equality only.
BRIN: giant, naturally ordered tables (time/ID ranges).

2) Composite & covering indexes (the 80/20)

Rule: Order columns in the index by (1) equality filters, most selective first; then (2) frequent range/ORDER BY.

-- Postgres
CREATE INDEX idx_orders_user_created ON orders (tenant_id, user_id, created_at DESC);

Good for: WHERE tenant_id=$1 AND user_id=$2 ORDER BY created_at DESC LIMIT 20
Works for: WHERE tenant_id=$1 AND user_id=$2 AND created_at > $3
Not for: WHERE created_at > $3 (leftmost prefix rule).

Covering (avoid table hit)

-- Postgres INCLUDE
CREATE INDEX idx_orders_cover
  ON orders (tenant_id, user_id, created_at DESC)
  INCLUDE (status, total_amount);
-- MySQL: put small selected columns at the end of the index definition.

A covering index lets the engine answer from the index pages alone (index‑only scan).

3) Partial & expression indexes (surgical speedups)

Partial (filter hot slice)

-- Only open orders—smaller, cheaper, faster
CREATE INDEX CONCURRENTLY idx_open_orders
  ON orders (tenant_id, created_at DESC)
  WHERE status = 'open';

Expression / function index

-- Case-insensitive email lookup
CREATE INDEX idx_users_email_lower ON users (lower(email));
-- Then query must use the same expression:
SELECT * FROM users WHERE lower(email) = lower($1);

If your predicate wraps a column (e.g., LOWER(col), date_trunc), a plain index on col won’t be used—create an expression index.

4) Patterns you’ll use weekly

Keyset pagination (fast, stable)

-- After first page (last_seen_created_at, id)
SELECT id, created_at
FROM posts
WHERE (created_at, id) < ($1, $2)
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Index: (created_at DESC, id DESC)

Prefix search (autocomplete)

SELECT id FROM users WHERE email ILIKE 'ali%';
-- Index: (email)  -- works for 'prefix%', not for '%middle%'

For arbitrary substring/typo search: Postgres pg_trgm (GIN + gin_trgm_ops) or an external search engine.

Multi‑tenant recent activity

CREATE INDEX idx_events_tenant_time ON events (tenant_id, occurred_at DESC);
SELECT * FROM events
WHERE tenant_id=$1 AND occurred_at > now() - interval '7 days'
ORDER BY occurred_at DESC
LIMIT 100;

Top‑N per group (use index order)

-- Get latest invoice per account
SELECT DISTINCT ON (account_id) account_id, id, created_at
FROM invoices
ORDER BY account_id, created_at DESC;
-- Index: (account_id, created_at DESC)

JSON properties (Postgres)

-- Exact match
CREATE INDEX idx_props_color ON products ((props->>'color'));
SELECT * FROM products WHERE props->>'color' = 'blue';

-- Many keys/contains: use GIN
CREATE INDEX idx_props_gin ON products USING GIN (props jsonb_path_ops);
SELECT * FROM products WHERE props @> '{"color":"blue"}';

5) Reading `EXPLAIN (ANALYZE)` without tears

Seq Scan: reading the whole table—okay for tiny tables or low selectivity.
Index Scan: great for selective predicates; returns rows in index order.
Bitmap Index Scan + Heap: good when many hits; batches lookups.
Rows/Actual Rows: planner estimates vs reality—big mismatches mean stale stats or skew.
Filter / Rows Removed by Filter: if high, your predicate doesn’t match the index order.
I/O vs CPU: check timing; if I/O bound, aim for covering or more selective predicates.

EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM orders WHERE tenant_id=$1 AND user_id=$2 ORDER BY created_at DESC LIMIT 20;

6) When not to index

Columns with very low cardinality (e.g., boolean flags) unless combined with others.
Small tables (seq scan wins).
Columns that change constantly (high write/update cost).
“Just in case” indexes—check usage stats first, then delete if unused.

7) Maintenance & safety

Keep stats fresh: Postgres autovacuum/autoanalyze; MySQL ANALYZE TABLE.
Build indexes online where possible (CONCURRENTLY, ONLINE/ALGORITHM=INPLACE).
Replace safely: create new index, switch queries, then drop the old (also concurrently).
Watch for bloat after big deletes; plan REINDEX in off‑peak windows.
Monitor: cache hit ratio, index/seq scan ratio, slow query logs.

Pitfalls & fast fixes

| Pitfall | Why it hurts | Fix | |---|---|---| | Wrong column order in composite | Planner can’t use leftmost rule | Put equality & most selective first | | Functions on columns in WHERE | Plain index is ignored | Use expression index and match predicate | | Leading wildcard (%term) | B‑tree can’t help | Use trigram/full‑text or search service | | Sorting different from index | Extra sort step | Align ORDER BY with index order (ASC/DESC) | | Too many similar indexes | Slower writes, more disk | Consolidate into a single composite/covering | | Stale stats | Bad plans | Run analyze; tune autovacuum thresholds |

Quick checklist

[ ] Index your WHERE + JOIN + ORDER BY (in that priority).
[ ] Design composite indexes with correct left‑prefix.
[ ] Use covering to skip table reads for hot paths.
[ ] Add partial/expression indexes for targeted speedups.
[ ] Verify with EXPLAIN (ANALYZE) and monitor real usage.
[ ] Periodically prune unused/overlapping indexes.

One‑minute adoption plan

List your top 5 slow queries from logs.
For each, sketch the index that matches its filters + order.
Create online; compare before/after with EXPLAIN (ANALYZE).
Consolidate overlapping indexes; drop unused ones.
Add a CI guard that runs EXPLAIN on changed queries (or uses a plan linter).