caduh

Database Indexing 101 — faster queries, fewer surprises

4 min read

A practical mental model for indexes, with copy‑paste patterns for B‑tree, composite order, covering, partial, and expression indexes—plus when **not** to index.

TL;DR

  • Indexes are sorted structures that let the DB jump to rows instead of scanning tables. Start with B‑tree.
  • Design composite indexes to match your most selective WHERE prefix and ORDER BY.
  • Use covering (include/select) columns to avoid table hits; use partial and expression indexes to target hot queries.
  • Don’t index everything: each index adds write cost, disk, and planning complexity. Measure with EXPLAIN (ANALYZE).

1) What an index buys you (and what it doesn’t)

  • Buys: fast lookups and ordered scans. Improves WHERE, JOIN ON, and ORDER BY that align with the index.
  • Doesn’t: make bad predicates fast (%term%), fix huge result sets, or help if the planner chooses a seq scan because selectivity is low.

Default types

  • B‑tree (default): equality/inequality, prefix LIKE 'foo%', sort & range.
  • GIN/GiST (Postgres): arrays, JSON, full‑text, geo.
  • Hash (engine‑specific): equality only.
  • BRIN: giant, naturally ordered tables (time/ID ranges).

2) Composite & covering indexes (the 80/20)

Rule: Order columns in the index by (1) equality filters, most selective first; then (2) frequent range/ORDER BY.

-- Postgres
CREATE INDEX idx_orders_user_created ON orders (tenant_id, user_id, created_at DESC);
  • Good for: WHERE tenant_id=$1 AND user_id=$2 ORDER BY created_at DESC LIMIT 20
  • Works for: WHERE tenant_id=$1 AND user_id=$2 AND created_at > $3
  • Not for: WHERE created_at > $3 (leftmost prefix rule).

Covering (avoid table hit)

-- Postgres INCLUDE
CREATE INDEX idx_orders_cover
  ON orders (tenant_id, user_id, created_at DESC)
  INCLUDE (status, total_amount);
-- MySQL: put small selected columns at the end of the index definition.

A covering index lets the engine answer from the index pages alone (index‑only scan).


3) Partial & expression indexes (surgical speedups)

Partial (filter hot slice)

-- Only open orders—smaller, cheaper, faster
CREATE INDEX CONCURRENTLY idx_open_orders
  ON orders (tenant_id, created_at DESC)
  WHERE status = 'open';

Expression / function index

-- Case-insensitive email lookup
CREATE INDEX idx_users_email_lower ON users (lower(email));
-- Then query must use the same expression:
SELECT * FROM users WHERE lower(email) = lower($1);

If your predicate wraps a column (e.g., LOWER(col), date_trunc), a plain index on col won’t be used—create an expression index.


4) Patterns you’ll use weekly

Keyset pagination (fast, stable)

-- After first page (last_seen_created_at, id)
SELECT id, created_at
FROM posts
WHERE (created_at, id) < ($1, $2)
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Index: (created_at DESC, id DESC)

Prefix search (autocomplete)

SELECT id FROM users WHERE email ILIKE 'ali%';
-- Index: (email)  -- works for 'prefix%', not for '%middle%'

For arbitrary substring/typo search: Postgres pg_trgm (GIN + gin_trgm_ops) or an external search engine.

Multi‑tenant recent activity

CREATE INDEX idx_events_tenant_time ON events (tenant_id, occurred_at DESC);
SELECT * FROM events
WHERE tenant_id=$1 AND occurred_at > now() - interval '7 days'
ORDER BY occurred_at DESC
LIMIT 100;

Top‑N per group (use index order)

-- Get latest invoice per account
SELECT DISTINCT ON (account_id) account_id, id, created_at
FROM invoices
ORDER BY account_id, created_at DESC;
-- Index: (account_id, created_at DESC)

JSON properties (Postgres)

-- Exact match
CREATE INDEX idx_props_color ON products ((props->>'color'));
SELECT * FROM products WHERE props->>'color' = 'blue';

-- Many keys/contains: use GIN
CREATE INDEX idx_props_gin ON products USING GIN (props jsonb_path_ops);
SELECT * FROM products WHERE props @> '{"color":"blue"}';

5) Reading EXPLAIN (ANALYZE) without tears

  • Seq Scan: reading the whole table—okay for tiny tables or low selectivity.
  • Index Scan: great for selective predicates; returns rows in index order.
  • Bitmap Index Scan + Heap: good when many hits; batches lookups.
  • Rows/Actual Rows: planner estimates vs reality—big mismatches mean stale stats or skew.
  • Filter / Rows Removed by Filter: if high, your predicate doesn’t match the index order.
  • I/O vs CPU: check timing; if I/O bound, aim for covering or more selective predicates.
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM orders WHERE tenant_id=$1 AND user_id=$2 ORDER BY created_at DESC LIMIT 20;

6) When not to index

  • Columns with very low cardinality (e.g., boolean flags) unless combined with others.
  • Small tables (seq scan wins).
  • Columns that change constantly (high write/update cost).
  • “Just in case” indexes—check usage stats first, then delete if unused.

7) Maintenance & safety

  • Keep stats fresh: Postgres autovacuum/autoanalyze; MySQL ANALYZE TABLE.
  • Build indexes online where possible (CONCURRENTLY, ONLINE/ALGORITHM=INPLACE).
  • Replace safely: create new index, switch queries, then drop the old (also concurrently).
  • Watch for bloat after big deletes; plan REINDEX in off‑peak windows.
  • Monitor: cache hit ratio, index/seq scan ratio, slow query logs.

Pitfalls & fast fixes

| Pitfall | Why it hurts | Fix | |---|---|---| | Wrong column order in composite | Planner can’t use leftmost rule | Put equality & most selective first | | Functions on columns in WHERE | Plain index is ignored | Use expression index and match predicate | | Leading wildcard (%term) | B‑tree can’t help | Use trigram/full‑text or search service | | Sorting different from index | Extra sort step | Align ORDER BY with index order (ASC/DESC) | | Too many similar indexes | Slower writes, more disk | Consolidate into a single composite/covering | | Stale stats | Bad plans | Run analyze; tune autovacuum thresholds |


Quick checklist

  • [ ] Index your WHERE + JOIN + ORDER BY (in that priority).
  • [ ] Design composite indexes with correct left‑prefix.
  • [ ] Use covering to skip table reads for hot paths.
  • [ ] Add partial/expression indexes for targeted speedups.
  • [ ] Verify with EXPLAIN (ANALYZE) and monitor real usage.
  • [ ] Periodically prune unused/overlapping indexes.

One‑minute adoption plan

  1. List your top 5 slow queries from logs.
  2. For each, sketch the index that matches its filters + order.
  3. Create online; compare before/after with EXPLAIN (ANALYZE).
  4. Consolidate overlapping indexes; drop unused ones.
  5. Add a CI guard that runs EXPLAIN on changed queries (or uses a plan linter).