The Testing Pyramid (That Actually Works) — fast feedback, minimal flakes

September 22nd, 20255 min read#dev #testing #qa #ci #reliability #ca-duh

Pragmatic ratios, what to test where, and how to keep CI under 10 minutes: unit vs integration vs E2E, contract & component tests, data management, and flake-killers.

TL;DR

Bias toward fast, deterministic unit tests (~60–70%).
Add integration/service tests (~20–30%) against real dependencies (DB, queue) using containers.
Keep E2E/UI to the critical flows (~5–10%): sign‑in, checkout, key settings. Parallelize and run full suites nightly, smoke on PRs.
Use contract tests (consumer‑driven) between services and component tests for UI logic—both sit between unit and E2E.
Kill flakiness at the source: control time, randomness, network, and eventual consistency. Seed data, isolate state, and retry only where the product does.

1) The shape that actually scales

          ▲   End‑to‑End (few, slow, critical paths)
          │
        ┌───┐  Component & Contract tests (UI units, API boundaries)
        │   │
      ┌───────┐ Integration/Service (DB, queue, broker with containers)
      │       │
    ┌───────────┐ Unit (pure logic, fast, isolated)
    └───────────┘

Typical ratios (by count): Unit 60–70% • Integration 20–30% • E2E 5–10%.
By time budget: Unit ≤ 2–3 min • Integration ≤ 5–7 min • E2E parallelized, smoke < 3 min on PRs; full nightly.

2) What belongs where

Unit (fast & pure)

Scope: functions, small classes, reducers, utils.
Rules: no network/disk/clock; use fakes for boundaries.
JS (Vitest/Jest)

import { sum } from "./math";
test("sum", () => { expect(sum(2,3)).toBe(5); });

Python (pytest)

def test_tax_rounds():
    assert calc_tax(19.99) == 3.0

Integration / Service

Scope: your code + real dependencies (DB, queue, cache).
Tooling: docker-compose/Testcontainers; migrate schema per test run; transaction rollbacks or ephemeral DB.
Node + Postgres (Testcontainers)

import { PostgreSqlContainer } from "@testcontainers/postgresql";
let pg;
beforeAll(async () => { pg = await new PostgreSqlContainer().start(); await migrate(pg); });
afterAll(async () => { await pg.stop(); });
test("create user persists", async () => {
  const id = await repo.createUser(pg, { email: "[email protected]" });
  const row = await repo.findUser(pg, id);
  expect(row.email).toBe("[email protected]");
});

E2E / UI (Playwright/Cypress)

Scope: cross‑service flows; browser + API + DB.
Keep small: test happy paths + a couple of edge cases; the rest goes to unit/integration.
Playwright

test("checkout works", async ({ page }) => {
  await page.goto("/");
  await page.getByTestId("add-to-cart").click();
  await page.getByTestId("checkout").click();
  await expect(page.getByRole("heading", { name: "Thanks" })).toBeVisible();
});

Component tests (UI)

Scope: a React/Vue/Svelte component with its template & events, no network. Faster than E2E, more realistic than unit.

import { render, screen } from "@testing-library/react";
test("button disables while saving", async () => {
  render(<SaveButton onSave={async () => {}} />);
  // ...
});

Contract tests (between services)

Consumer‑driven Pact: consumer defines expectations; provider verifies on CI. Prevents breaking changes without E2E sprawl.

3) Test data & isolation

Factories over fixtures; keep data minimal and explicit.
DB isolation: transaction per test with rollback, or ephemeral DB/container per worker.
IDs & time: seed RNG, freeze time. In JS, use fake timers; in Python, freezegun.
Avoid shared mutable state; tear down properly.

pytest (transaction rollbacks)

@pytest.fixture(autouse=True)
def _db(db_session):
    tx = db_session.begin()
    yield
    tx.rollback()

4) Flake killers (checklist)

Time: freeze or inject clock; avoid real sleeps—poll with a timeout helper.
Network: block unexpected HTTP; stub outside calls at the edge (e.g., payment provider).
Async/races: wait for signals (selectors visible, job done) not arbitrary delays.
Randomness: seed RNG; make nondeterminism explicit.
Eventual consistency: in tests that mirror prod semantics, add bounded polling helpers.

// Wait helper with timeout
export async function waitFor<T>(fn: () => Promise<T>, ms = 1000, step = 25) {
  const end = Date.now() + ms;
  let lastErr;
  while (Date.now() < end) {
    try { return await fn(); } catch (e) { lastErr = e; await new Promise(r => setTimeout(r, step)); }
  }
  throw lastErr ?? new Error("timeout");
}

5) What to mock (and what not to)

Mock: third‑party APIs, email/SMS, payments, clock, randomness, OS/FS where slow.
Prefer fakes over mocks for your own interfaces (in‑memory repo/queue).
Don’t mock the code under test; it creates green false positives.
Snapshot tests sparingly—on stable, human‑reviewable output only.

6) Coverage & confidence

Track line + branch coverage per package; aim for ~80% line / ~60% branch overall, higher for core logic.
Use mutation testing (Stryker, mutmut) on critical modules to measure assertion quality.
Gate merges on changed‑files coverage rather than repo‑wide % to avoid gaming.

7) CI that stays under 10 minutes

Shard & parallelize by test file; cache dependencies and compiled artifacts.
Run unit + key integration on PR; run full E2E nightly and on release candidates.
Retry only flaky E2E (1–2 times) and quarantine repeat offenders.
Artifacts: E2E videos/screenshots/logs; test results in JUnit format; upload HTML coverage.
Use Testcontainers with reusable layers or service containers to avoid cold starts.

GitHub Actions (matrix split sketch)

jobs:
  test:
    strategy: { matrix: { shard: [1,2,3,4] } }
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npm run test -- --shard=${{matrix.shard}}/4 --reporter=junit

8) Release safety net (beyond tests)

Smoke tests post‑deploy (synthetic checks).
Contract verification in CI for services you call/serve.
Feature flags for risky paths; dark‑launch + canary.
Runtime assertions and structured logs to catch invariants you didn’t test.

Pitfalls & fast fixes

| Pitfall | Why it hurts | Fix | |---|---|---| | E2E‑heavy suites | Slow, flaky, costly | Push logic to unit/integration; trim to critical flows | | Global test DB | Hidden coupling, flake | Ephemeral DB or per‑test transaction rollback | | Sleeping in tests | Racy & slow | Wait for signals, not time | | Mocking everything | False confidence | Mock at the edge; use fakes otherwise | | Unseeded randomness/time | Non‑reproducible | Seed & freeze; inject a clock | | Coverage chasing 100% | Busywork | Focus on risk‑based coverage + mutation testing |

Quick checklist

[ ] Ratios: Unit 60–70%, Integration 20–30%, E2E 5–10%.
[ ] Containers for real deps in integration tests.
[ ] Critical E2E only; parallelize and artifact logs/video.
[ ] Freeze time, seed random, block network by default.
[ ] Prefer fakes over mocks; contract/component tests where they fit.
[ ] Keep CI < 10 minutes with sharding and caches.

One‑minute adoption plan

Label current tests by type; cut or migrate E2E that duplicate unit/integration.
Add Testcontainers/docker‑compose for DB/queue integration in CI.
Freeze time and seed randomness in test runners; add a waitFor helper.
Define a smoke E2E suite (≤ 3 minutes) for PRs; full E2E nightly.
Track coverage (line+branch) and add mutation testing to core modules.