caduh

Pagination That Scales: Offset vs Cursor vs Keyset

8 min read

Trade-offs, pitfalls, and production-ready code with SQL, Node, FastAPI, MongoDB and Elasticsearch examples.

Pagination That Scales: Offset vs Cursor vs Keyset

Trade-offs, pitfalls, and production-ready code

TL;DR

  • Offset: simplest, supports “jump to page N”, but slows down as N grows and can miss/duplicate on hot tables.
  • Cursor (API) + Keyset (DB “seek”): fastest and most stable under write load, great for infinite scroll and “next/prev”. Harder to implement, no true random access without extra work.
  • Choose: Use offset for small/admin/analytics pages that need page numbers. Use cursor+keyset for feeds, inboxes, search results at scale.

The Data Model We’ll Use

-- PostgreSQL
CREATE TABLE posts (
  id           BIGSERIAL PRIMARY KEY,
  title        TEXT NOT NULL,
  body         TEXT NOT NULL,
  status       TEXT NOT NULL CHECK (status IN ('DRAFT','PUBLISHED')),
  published_at TIMESTAMPTZ NOT NULL DEFAULT now(),  -- immutable once published
  author_id    BIGINT NOT NULL
);
-- Useful composite index for our sort/filter:
CREATE INDEX idx_posts_pub_desc ON posts (published_at DESC, id DESC) WHERE status='PUBLISHED';

Use an immutable sort column (e.g., published_at when the post becomes visible). If the sort field changes after publication, pages will “reshuffle”.


1) Offset Pagination

Pros

  • Dead simple (LIMIT ... OFFSET ...)
  • Supports page numbers (“go to page 7”)
  • Easy to add total counts

Cons

  • Slow for large offsets: DB must skip N rows
  • Unstable under writes: inserts/deletes cause gaps/duplication across pages
  • Expensive with complex filters unless carefully indexed

SQL

-- Page params: page (1-based), page_size
-- Compute offset = (page-1)*page_size
SELECT id, title, published_at
FROM posts
WHERE status='PUBLISHED'
ORDER BY published_at DESC, id DESC
LIMIT $1 OFFSET $2;  -- $1=page_size, $2=offset

Node (Express + node-postgres)

import express from "express";
import { Pool } from "pg";
const app = express();
const pool = new Pool();

app.get("/posts", async (req, res) => {
  const pageSize = Math.min(Number(req.query.page_size) || 20, 100);
  const page = Math.max(Number(req.query.page) || 1, 1);
  const offset = (page - 1) * pageSize;

  const { rows } = await pool.query(
    `SELECT id, title, published_at
     FROM posts
     WHERE status='PUBLISHED'
     ORDER BY published_at DESC, id DESC
     LIMIT $1 OFFSET $2`,
    [pageSize, offset]
  );

  const count = await pool.query(
    `SELECT count(*)::int AS total FROM posts WHERE status='PUBLISHED'`
  );

  res.json({
    page, page_size: pageSize,
    total: count.rows[0].total,
    data: rows
  });
});

When it’s fine: small catalogs, admin panels, reports, search with “page 12 of 40”.


2) Cursor (API) + Keyset (DB “seek”) Pagination

Idea: Instead of skipping rows, start after the last row you saw using a cursor (opaque token) and a keyset (DB predicate like “(published_at,id) < (last_published_at,last_id)”).

Pros

  • Consistent under writes (no drift)
  • Fast for deep pages (no big skip)
  • Perfect for infinite scroll, “Load more”

Cons

  • More code (cursors, next/prev)
  • No native “jump to page 37”
  • Need a stable, unique ordering (use tie-breaker id)

Cursor shape

Make the cursor an opaque base64 of the last row’s sort keys (and direction).

// cursor.ts
export function encodeCursor(v: unknown) {
  return Buffer.from(JSON.stringify(v)).toString("base64url");
}
export function decodeCursor<T=any>(c?: string|null): T|null {
  if (!c) return null;
  return JSON.parse(Buffer.from(c, "base64url").toString("utf8"));
}

SQL (PostgreSQL)

-- Forward page (newer -> older): ORDER BY published_at DESC, id DESC
-- Input: last_published_at, last_id (from cursor), page_size
-- First page: no cursor => no WHERE
SELECT id, title, published_at
FROM posts
WHERE status='PUBLISHED'
  AND (
    $1 IS NULL AND $2 IS NULL
    OR (published_at, id) < ($1::timestamptz, $2::bigint)
  )
ORDER BY published_at DESC, id DESC
LIMIT $3; -- page_size

Handle previous page by reversing the comparison and order, then reverse the result in app code:

-- Previous page: rows "after" the current first item (DESC timeline),
-- so we need > and ASC to walk backwards, then reverse in app.
SELECT id, title, published_at
FROM posts
WHERE status='PUBLISHED'
  AND (published_at, id) > ($1::timestamptz, $2::bigint)
ORDER BY published_at ASC, id ASC
LIMIT $3;

Node (Express) full route with next/prev

import express from "express";
import { Pool } from "pg";
import { encodeCursor, decodeCursor } from "./cursor";
const app = express();
const pool = new Pool();

type Cursor = { published_at: string; id: number; dir: "next"|"prev" };

app.get("/feed", async (req, res) => {
  const pageSize = Math.min(Number(req.query.page_size) || 20, 100);
  const cursorParam = String(req.query.cursor || "");
  const cursor = decodeCursor<Cursor>(cursorParam);
  const isPrev = cursor?.dir === "prev";

  if (!isPrev) {
    // forward
    const args = [cursor?.published_at ?? null, cursor?.id ?? null, pageSize + 1];
    const { rows } = await pool.query(
      `SELECT id, title, published_at
       FROM posts
       WHERE status='PUBLISHED'
         AND (($1 IS NULL AND $2 IS NULL) OR (published_at, id) < ($1::timestamptz, $2::bigint))
       ORDER BY published_at DESC, id DESC
       LIMIT $3`, args);

    const hasMore = rows.length > pageSize;
    const data = hasMore ? rows.slice(0, pageSize) : rows;

    const nextCursor = hasMore
      ? encodeCursor({ published_at: data[data.length-1].published_at, id: data[data.length-1].id, dir: "next" })
      : null;
    const prevCursor = data.length
      ? encodeCursor({ published_at: data[0].published_at, id: data[0].id, dir: "prev" })
      : null;

    res.json({ data, next_cursor: nextCursor, prev_cursor: prevCursor });
  } else {
    // previous: walk ASC then reverse
    const args = [cursor.published_at, cursor.id, pageSize + 1];
    const { rows } = await pool.query(
      `SELECT id, title, published_at
       FROM posts
       WHERE status='PUBLISHED'
         AND (published_at, id) > ($1::timestamptz, $2::bigint)
       ORDER BY published_at ASC, id ASC
       LIMIT $3`, args);

    const hasMorePrev = rows.length > pageSize;
    const page = hasMorePrev ? rows.slice(0, pageSize) : rows; // take the closest items
    const data = page.reverse();

    const nextCursor = data.length
      ? encodeCursor({ published_at: data[data.length-1].published_at, id: data[data.length-1].id, dir: "next" })
      : null;
    const prevCursor = hasMorePrev
      ? encodeCursor({ published_at: data[0].published_at, id: data[0].id, dir: "prev" })
      : null;

    res.json({ data, next_cursor: nextCursor, prev_cursor: prevCursor });
  }
});

Python (FastAPI + SQLAlchemy Core)

from fastapi import FastAPI, Query
from base64 import urlsafe_b64decode, urlsafe_b64encode
import json
from sqlalchemy import text
from sqlalchemy.ext.asyncio import create_async_engine

app = FastAPI()
engine = create_async_engine("postgresql+asyncpg://...")

def enc(o): return urlsafe_b64encode(json.dumps(o).encode()).decode()
def dec(c): return json.loads(urlsafe_b64decode(c.encode())) if c else None

@app.get("/feed")
async def feed(page_size: int = Query(20, ge=1, le=100), cursor: str | None = None):
    cur = dec(cursor)
    async with engine.begin() as conn:
        if not cur or cur["dir"] == "next":
            q = text("""
            SELECT id, title, published_at
            FROM posts
            WHERE status='PUBLISHED'
              AND ( :ts IS NULL AND :id IS NULL
                    OR (published_at, id) < (:ts::timestamptz, :id::bigint) )
            ORDER BY published_at DESC, id DESC
            LIMIT :lim
            """);
            rows = (await conn.execute(q, {"ts": cur["published_at"] if cur else None,
                                           "id": cur["id"] if cur else None,
                                           "lim": page_size+1})).mappings().all()
            has_more = len(rows) > page_size
            data = rows[:page_size] if has_more else rows
            next_cursor = enc({"published_at": data[-1]["published_at"].isoformat(), "id": data[-1]["id"], "dir":"next"}) if data and has_more else None
            prev_cursor = enc({"published_at": data[0]["published_at"].isoformat(), "id": data[0]["id"], "dir":"prev"}) if data else None
            return {"data": data, "next_cursor": next_cursor, "prev_cursor": prev_cursor}
        else:
            q = text("""
            SELECT id, title, published_at
            FROM posts
            WHERE status='PUBLISHED'
              AND (published_at, id) > (:ts::timestamptz, :id::bigint)
            ORDER BY published_at ASC, id ASC
            LIMIT :lim
            """);
            rows = (await conn.execute(q, {"ts": cur["published_at"], "id": cur["id"], "lim": page_size+1})).mappings().all()
            has_more_prev = len(rows) > page_size
            page = rows[:page_size] if has_more_prev else rows
            data = list(reversed(page))
            next_cursor = enc({"published_at": data[-1]["published_at"].isoformat(), "id": data[-1]["id"], "dir": "next"}) if data else None
            prev_cursor = enc({"published_at": data[0]["published_at"].isoformat(), "id": data[0]["id"], "dir": "prev"}) if has_more_prev else None
            return {"data": data, "next_cursor": next_cursor, "prev_cursor": prev_cursor}

Key Implementation Notes (You’ll Thank Yourself Later)

A) Always use a unique tie-breaker

Sort by (published_at DESC, id DESC). Without a tie-breaker, two rows with equal timestamps can flip order between requests.

B) Make the cursor opaque

Encode JSON to base64url. You can include filter snapshots (e.g., status, author_id) so cursors can be validated against changed filters.

C) Direction matters

  • Forward (“next”): < with DESC ordering
  • Backward (“prev”): > with ASC ordering, then reverse in app

D) Index to your exact predicate

CREATE INDEX CONCURRENTLY IF NOT EXISTS
  idx_posts_pub_desc ON posts (published_at DESC, id DESC)
  WHERE status='PUBLISHED';

If you filter by author or tag, extend the index: (author_id, published_at DESC, id DESC) or use a covering index suitable for your most common filter.

E) Hot tables & consistency

  • For perfect “snapshot consistency” across many pages, you need a listing version (materialized view or a table of IDs fixed at query start). Otherwise, expect slight drift as new rows arrive.
  • For feeds/inboxes, drift is often acceptable (users expect new stuff to appear).

3) “Jump to Page N” with Keyset?

True random access isn’t native to keyset. Options:

  • Jump to date: let users choose a date/time; you keyset-seek to the first item at/after that time.
  • Bookmark ID: “Open around post #12345”—seek to (published_at,id) of that row and show neighbors.
  • Approximate page mapping: store periodic anchors (every 1000th ID) to estimate offsets; then keyset from the anchor.

4) Counting, “Total Results”, and SEO Pages

Counts:

  • Exact count: SELECT count(*) (can be slow on huge tables + filters).
  • Fast approximate (Postgres):
    • reltuples estimate from pg_class (coarse)
    • Incremental counters (denormalized) or materialized view for frequently used filters.

SEO page lists (category pages with numbered pagination):

  • Offset is OK if counts are required and result sets are not massive.
  • Cache pages and counts aggressively (CDN + application cache).
  • If pages are huge, consider precomputing ID lists nightly.

5) Edge Cases & Gotchas

  • Edits to sort key: If published_at can change, pages will reshuffle. Prefer immutable sort keys (e.g., first_published_at).
  • Deletes: Offset pages shift; keyset unaffected.
  • Duplicates across pages: Ensure strict comparison on keyset and consistent inclusion/exclusion (don’t use <= in forward pagination).
  • Prev page overlap: When walking backwards, fetch page_size+1, take the closest page_size rows in ASC, then reverse.
  • Multi-column seek: For ASC order, change operators accordingly. For complex sorts, normalize to a tuple and apply lexicographic comparisons.

6) Beyond SQL

MongoDB (seek by _id)

// Next page
db.posts.find({ status: "PUBLISHED", _id: { $lt: lastId } })
  .sort({ _id: -1 }).limit(21);

// Prev page: use $gt and reverse in app.

Elasticsearch / OpenSearch (search_after)

{
  "size": 20,
  "sort": [{ "published_at": "desc" }, { "id": "desc" }],
  "search_after": ["2025-10-30T12:00:00Z", 12345],
  "query": { "term": { "status": "PUBLISHED" } }
}

7) Decision Cheat-Sheet

| Requirement | Pick | Why | |---|---|---| | Infinite scroll, large dataset, high write rate | Cursor + Keyset | Fast, stable, no deep skips | | Page numbers, export, SEO category pages | Offset (+ caching) | Easy counts, jump to N | | “Open near item X” | Keyset from anchor | Deterministic and quick | | “Jump to date” | Keyset by timestamp | Predictable UX, no big skip | | Strict snapshot across many pages | Materialize IDs | Consistency over freshness |


8) Test Like It’s Production

  • Seed 5–50M rows, realistic skew (recent heavy).
  • Benchmark: page 1, 10, 1000 with and without competing inserts/deletes.
  • Verify no duplicates/misses when new rows are inserted between requests.
  • Validate indexes with EXPLAIN (ANALYZE, BUFFERS)—you should see Index Only Scan or Index Scan, not sequential scans.

9) Quick Reference Queries

Forward (DESC)

-- First page
... WHERE status='PUBLISHED'
ORDER BY published_at DESC, id DESC
LIMIT :n;

-- Next page (after last tuple)
... WHERE status='PUBLISHED'
  AND (published_at, id) < (:last_ts, :last_id)
ORDER BY published_at DESC, id DESC
LIMIT :n;

Backward (DESC timeline)

-- Prev page (before first tuple)
... WHERE status='PUBLISHED'
  AND (published_at, id) > (:first_ts, :first_id)
ORDER BY published_at ASC, id ASC
LIMIT :n; -- then reverse in app

Index

CREATE INDEX CONCURRENTLY IF NOT EXISTS
  idx_posts_pub_desc ON posts (published_at DESC, id DESC)
  WHERE status='PUBLISHED';