caduh

The Testing Pyramid (That Actually Works) — fast feedback, minimal flakes

5 min read

Pragmatic ratios, what to test where, and how to keep CI under 10 minutes: unit vs integration vs E2E, contract & component tests, data management, and flake-killers.

TL;DR

  • Bias toward fast, deterministic unit tests (~60–70%).
  • Add integration/service tests (~20–30%) against real dependencies (DB, queue) using containers.
  • Keep E2E/UI to the critical flows (~5–10%): sign‑in, checkout, key settings. Parallelize and run full suites nightly, smoke on PRs.
  • Use contract tests (consumer‑driven) between services and component tests for UI logic—both sit between unit and E2E.
  • Kill flakiness at the source: control time, randomness, network, and eventual consistency. Seed data, isolate state, and retry only where the product does.

1) The shape that actually scales

          ▲   End‑to‑End (few, slow, critical paths)
          │
        ┌───┐  Component & Contract tests (UI units, API boundaries)
        │   │
      ┌───────┐ Integration/Service (DB, queue, broker with containers)
      │       │
    ┌───────────┐ Unit (pure logic, fast, isolated)
    └───────────┘

Typical ratios (by count): Unit 60–70% • Integration 20–30% • E2E 5–10%.
By time budget: Unit ≤ 2–3 min • Integration ≤ 5–7 min • E2E parallelized, smoke < 3 min on PRs; full nightly.


2) What belongs where

Unit (fast & pure)

  • Scope: functions, small classes, reducers, utils.
  • Rules: no network/disk/clock; use fakes for boundaries.
  • JS (Vitest/Jest)
import { sum } from "./math";
test("sum", () => { expect(sum(2,3)).toBe(5); });
  • Python (pytest)
def test_tax_rounds():
    assert calc_tax(19.99) == 3.0

Integration / Service

  • Scope: your code + real dependencies (DB, queue, cache).
  • Tooling: docker-compose/Testcontainers; migrate schema per test run; transaction rollbacks or ephemeral DB.
  • Node + Postgres (Testcontainers)
import { PostgreSqlContainer } from "@testcontainers/postgresql";
let pg;
beforeAll(async () => { pg = await new PostgreSqlContainer().start(); await migrate(pg); });
afterAll(async () => { await pg.stop(); });
test("create user persists", async () => {
  const id = await repo.createUser(pg, { email: "[email protected]" });
  const row = await repo.findUser(pg, id);
  expect(row.email).toBe("[email protected]");
});

E2E / UI (Playwright/Cypress)

  • Scope: cross‑service flows; browser + API + DB.
  • Keep small: test happy paths + a couple of edge cases; the rest goes to unit/integration.
  • Playwright
test("checkout works", async ({ page }) => {
  await page.goto("/");
  await page.getByTestId("add-to-cart").click();
  await page.getByTestId("checkout").click();
  await expect(page.getByRole("heading", { name: "Thanks" })).toBeVisible();
});

Component tests (UI)

  • Scope: a React/Vue/Svelte component with its template & events, no network. Faster than E2E, more realistic than unit.
import { render, screen } from "@testing-library/react";
test("button disables while saving", async () => {
  render(<SaveButton onSave={async () => {}} />);
  // ...
});

Contract tests (between services)

  • Consumer‑driven Pact: consumer defines expectations; provider verifies on CI. Prevents breaking changes without E2E sprawl.

3) Test data & isolation

  • Factories over fixtures; keep data minimal and explicit.
  • DB isolation: transaction per test with rollback, or ephemeral DB/container per worker.
  • IDs & time: seed RNG, freeze time. In JS, use fake timers; in Python, freezegun.
  • Avoid shared mutable state; tear down properly.

pytest (transaction rollbacks)

@pytest.fixture(autouse=True)
def _db(db_session):
    tx = db_session.begin()
    yield
    tx.rollback()

4) Flake killers (checklist)

  • Time: freeze or inject clock; avoid real sleeps—poll with a timeout helper.
  • Network: block unexpected HTTP; stub outside calls at the edge (e.g., payment provider).
  • Async/races: wait for signals (selectors visible, job done) not arbitrary delays.
  • Randomness: seed RNG; make nondeterminism explicit.
  • Eventual consistency: in tests that mirror prod semantics, add bounded polling helpers.
// Wait helper with timeout
export async function waitFor<T>(fn: () => Promise<T>, ms = 1000, step = 25) {
  const end = Date.now() + ms;
  let lastErr;
  while (Date.now() < end) {
    try { return await fn(); } catch (e) { lastErr = e; await new Promise(r => setTimeout(r, step)); }
  }
  throw lastErr ?? new Error("timeout");
}

5) What to mock (and what not to)

  • Mock: third‑party APIs, email/SMS, payments, clock, randomness, OS/FS where slow.
  • Prefer fakes over mocks for your own interfaces (in‑memory repo/queue).
  • Don’t mock the code under test; it creates green false positives.
  • Snapshot tests sparingly—on stable, human‑reviewable output only.

6) Coverage & confidence

  • Track line + branch coverage per package; aim for ~80% line / ~60% branch overall, higher for core logic.
  • Use mutation testing (Stryker, mutmut) on critical modules to measure assertion quality.
  • Gate merges on changed‑files coverage rather than repo‑wide % to avoid gaming.

7) CI that stays under 10 minutes

  • Shard & parallelize by test file; cache dependencies and compiled artifacts.
  • Run unit + key integration on PR; run full E2E nightly and on release candidates.
  • Retry only flaky E2E (1–2 times) and quarantine repeat offenders.
  • Artifacts: E2E videos/screenshots/logs; test results in JUnit format; upload HTML coverage.
  • Use Testcontainers with reusable layers or service containers to avoid cold starts.

GitHub Actions (matrix split sketch)

jobs:
  test:
    strategy: { matrix: { shard: [1,2,3,4] } }
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npm run test -- --shard=${{matrix.shard}}/4 --reporter=junit

8) Release safety net (beyond tests)

  • Smoke tests post‑deploy (synthetic checks).
  • Contract verification in CI for services you call/serve.
  • Feature flags for risky paths; dark‑launch + canary.
  • Runtime assertions and structured logs to catch invariants you didn’t test.

Pitfalls & fast fixes

| Pitfall | Why it hurts | Fix | |---|---|---| | E2E‑heavy suites | Slow, flaky, costly | Push logic to unit/integration; trim to critical flows | | Global test DB | Hidden coupling, flake | Ephemeral DB or per‑test transaction rollback | | Sleeping in tests | Racy & slow | Wait for signals, not time | | Mocking everything | False confidence | Mock at the edge; use fakes otherwise | | Unseeded randomness/time | Non‑reproducible | Seed & freeze; inject a clock | | Coverage chasing 100% | Busywork | Focus on risk‑based coverage + mutation testing |


Quick checklist

  • [ ] Ratios: Unit 60–70%, Integration 20–30%, E2E 5–10%.
  • [ ] Containers for real deps in integration tests.
  • [ ] Critical E2E only; parallelize and artifact logs/video.
  • [ ] Freeze time, seed random, block network by default.
  • [ ] Prefer fakes over mocks; contract/component tests where they fit.
  • [ ] Keep CI < 10 minutes with sharding and caches.

One‑minute adoption plan

  1. Label current tests by type; cut or migrate E2E that duplicate unit/integration.
  2. Add Testcontainers/docker‑compose for DB/queue integration in CI.
  3. Freeze time and seed randomness in test runners; add a waitFor helper.
  4. Define a smoke E2E suite (≤ 3 minutes) for PRs; full E2E nightly.
  5. Track coverage (line+branch) and add mutation testing to core modules.