caduh

What is an N+1 Query and How Do You Fix It?

4 min read

A fundamental database performance problem: why N+1 happens, how to spot it, and fixes with eager loading/batching in popular ORMs (Django, Rails, Prisma/Sequelize, SQLAlchemy).

TL;DR

  • N+1 = you run 1 query to load N parents, then N more queries (one per parent) to load a related record → N+1 total. Runtime grows linearly with result size.
  • Fix it with eager loading: join/prefetch the relation up front (one extra query or a JOIN), or batch with WHERE IN (DataLoader pattern).
  • Turn on query logging/counters, paginate, and keep relation access out of tight loops.

The concept (60 seconds)

Before (N+1):

-- 1) parents
SELECT id, title, author_id FROM posts ORDER BY created_at DESC LIMIT 50;

-- 2..51) author for each post (50 queries)
SELECT id, name FROM users WHERE id = ?;  -- repeated 50 times

After (eager loaded):

-- Option A: JOIN for many-to-one / one-to-one
SELECT p.*, u.*
FROM posts p
JOIN users u ON u.id = p.author_id
ORDER BY p.created_at DESC LIMIT 50;

-- Option B: two queries total (prefetch)
SELECT id, title, author_id FROM posts ORDER BY created_at DESC LIMIT 50;
SELECT * FROM users WHERE id IN (/* distinct author_ids from the 50 posts */);

A simple ORM example (before → after)

Django ORM

Before (N+1):

posts = Post.objects.order_by("-created_at")[:50]
for p in posts:
    print(p.author.name)  # each access triggers a query

After (eager load):

# many-to-one / one-to-one
posts = Post.objects.select_related("author").order_by("-created_at")[:50]

# for one-to-many/many-to-many collections, also prefetch:
posts = (Post.objects
         .select_related("author")
         .prefetch_related("tags", "comments")
         .order_by("-created_at")[:50])

Rails (ActiveRecord)

Before:

posts = Post.order(created_at: :desc).limit(50)
posts.each { |p| puts p.author.name }  # N+1

After:

posts = Post.includes(:author).order(created_at: :desc).limit(50)
# For collections:
posts = Post.includes(:author, :comments, :tags).limit(50)

Node (Prisma & Sequelize)

Prisma — before:

const posts = await prisma.post.findMany({ take: 50, orderBy: { createdAt: "desc" } });
for (const p of posts) {
  const author = await prisma.user.findUnique({ where: { id: p.authorId } }); // N queries
}

Prisma — after (include/eager):

const posts = await prisma.post.findMany({
  take: 50,
  orderBy: { createdAt: "desc" },
  include: { author: true, tags: true, _count: { select: { comments: true } } }
});

Sequelize — after (include/eager):

const posts = await Post.findAll({
  limit: 50, order: [["createdAt","DESC"]],
  include: [{ model: User, as: "author" }, { model: Tag }, { model: Comment }]
});

Python (SQLAlchemy ORM)

Before (lazy load inside loop):

posts = session.query(Post).order_by(Post.created_at.desc()).limit(50).all()
for p in posts:
    print(p.author.name)  # emits a query per post

After (eager load):

from sqlalchemy.orm import joinedload, subqueryload

posts = (session.query(Post)
         .options(joinedload(Post.author), subqueryload(Post.comments))
         .order_by(Post.created_at.desc()).limit(50).all())

When to JOIN vs PREFETCH

  • Use a JOIN (select_related, joinedload) for many-to-one or one-to-one. You still get one result row per parent (watch out for duplicates if you join collections).
  • Use prefetch (prefetch_related, second query with WHERE IN, subqueryload) for one-to-many/many-to-many. Avoids row explosion and keeps memory predictable.
  • For counts use ORM helpers (.annotate() / ._count) or database-level aggregates instead of fetching all children.

Spotting N+1 early

  • Enable query logs in dev/test and count queries per request.
  • Add budget tests (“this endpoint should perform ≤ 5 queries”).
  • Watch for repeated similar SQL (same shape, different bind values) while iterating collections.
  • APM/Profilers (New Relic, Datadog, Skylight, Django Debug Toolbar) can flag N+1 patterns.

Beyond eager loading: batching & GraphQL

  • For resolvers that fetch per item (classic GraphQL N+1), use the DataLoader pattern: batch keys per tick and query with WHERE id IN (...).
  • Cache per-request lookups to avoid repeat work within a single request.

Tiny DataLoader sketch (Node):

import DataLoader from "dataloader";
const userById = new DataLoader(async (ids: readonly number[]) => {
  const rows = await db.user.findMany({ where: { id: { in: ids as number[] } } });
  const map = new Map(rows.map(r => [r.id, r]));
  return ids.map(id => map.get(id));
});

Pitfalls & fixes

| Problem | Why it hurts | Fix | |---|---|---| | Joining large collections explodes rows | Duplicates per parent × child | Use prefetch/secondary query; DISTINCT ON or group aggregates | | Eager loading everything by default | Memory/time waste | Eager load only the relations you need on that path | | Hidden N+1 in templates/serializers | Access inside view layer | Move to view/query layer; prefetch before rendering | | Paginating after eager load | Wrong slice or big joins | Paginate first, then eager load only the page | | Counting children in a loop | N queries for counts | Use aggregates or precomputed counters |


Quick checklist

  • [ ] Paginate first; don’t render 1000 rows.
  • [ ] Add eager loading: JOIN for singular relations; prefetch for collections.
  • [ ] Use aggregates for counts; avoid per-item queries.
  • [ ] Add query budgets/tests and enable query logging in CI/dev.
  • [ ] For GraphQL/resolvers, batch with DataLoader.
  • [ ] Re-measure p95 latency and query counts after changes.

One‑minute adoption plan

  1. Turn on SQL logging and hit your slow endpoint; note query count and shapes.
  2. Identify the relation access inside loops (template, serializer, or resolver).
  3. Add select_related/includes/joinedload for singular relations; prefetch_related/IN batching for collections.
  4. Replace per-item counts with aggregates.
  5. Add a test to cap query count, and monitor after deploy.