TL;DR
- Bias toward fast, deterministic unit tests (~60–70%).
- Add integration/service tests (~20–30%) against real dependencies (DB, queue) using containers.
- Keep E2E/UI to the critical flows (~5–10%): sign‑in, checkout, key settings. Parallelize and run full suites nightly, smoke on PRs.
- Use contract tests (consumer‑driven) between services and component tests for UI logic—both sit between unit and E2E.
- Kill flakiness at the source: control time, randomness, network, and eventual consistency. Seed data, isolate state, and retry only where the product does.
1) The shape that actually scales
▲ End‑to‑End (few, slow, critical paths)
│
┌───┐ Component & Contract tests (UI units, API boundaries)
│ │
┌───────┐ Integration/Service (DB, queue, broker with containers)
│ │
┌───────────┐ Unit (pure logic, fast, isolated)
└───────────┘
Typical ratios (by count): Unit 60–70% • Integration 20–30% • E2E 5–10%.
By time budget: Unit ≤ 2–3 min • Integration ≤ 5–7 min • E2E parallelized, smoke < 3 min on PRs; full nightly.
2) What belongs where
Unit (fast & pure)
- Scope: functions, small classes, reducers, utils.
- Rules: no network/disk/clock; use fakes for boundaries.
- JS (Vitest/Jest)
import { sum } from "./math";
test("sum", () => { expect(sum(2,3)).toBe(5); });
- Python (pytest)
def test_tax_rounds():
assert calc_tax(19.99) == 3.0
Integration / Service
- Scope: your code + real dependencies (DB, queue, cache).
- Tooling: docker-compose/Testcontainers; migrate schema per test run; transaction rollbacks or ephemeral DB.
- Node + Postgres (Testcontainers)
import { PostgreSqlContainer } from "@testcontainers/postgresql";
let pg;
beforeAll(async () => { pg = await new PostgreSqlContainer().start(); await migrate(pg); });
afterAll(async () => { await pg.stop(); });
test("create user persists", async () => {
const id = await repo.createUser(pg, { email: "[email protected]" });
const row = await repo.findUser(pg, id);
expect(row.email).toBe("[email protected]");
});
E2E / UI (Playwright/Cypress)
- Scope: cross‑service flows; browser + API + DB.
- Keep small: test happy paths + a couple of edge cases; the rest goes to unit/integration.
- Playwright
test("checkout works", async ({ page }) => {
await page.goto("/");
await page.getByTestId("add-to-cart").click();
await page.getByTestId("checkout").click();
await expect(page.getByRole("heading", { name: "Thanks" })).toBeVisible();
});
Component tests (UI)
- Scope: a React/Vue/Svelte component with its template & events, no network. Faster than E2E, more realistic than unit.
import { render, screen } from "@testing-library/react";
test("button disables while saving", async () => {
render(<SaveButton onSave={async () => {}} />);
// ...
});
Contract tests (between services)
- Consumer‑driven Pact: consumer defines expectations; provider verifies on CI. Prevents breaking changes without E2E sprawl.
3) Test data & isolation
- Factories over fixtures; keep data minimal and explicit.
- DB isolation: transaction per test with rollback, or ephemeral DB/container per worker.
- IDs & time: seed RNG, freeze time. In JS, use fake timers; in Python,
freezegun. - Avoid shared mutable state; tear down properly.
pytest (transaction rollbacks)
@pytest.fixture(autouse=True)
def _db(db_session):
tx = db_session.begin()
yield
tx.rollback()
4) Flake killers (checklist)
- Time: freeze or inject clock; avoid real sleeps—poll with a timeout helper.
- Network: block unexpected HTTP; stub outside calls at the edge (e.g., payment provider).
- Async/races: wait for signals (selectors visible, job done) not arbitrary delays.
- Randomness: seed RNG; make nondeterminism explicit.
- Eventual consistency: in tests that mirror prod semantics, add bounded polling helpers.
// Wait helper with timeout
export async function waitFor<T>(fn: () => Promise<T>, ms = 1000, step = 25) {
const end = Date.now() + ms;
let lastErr;
while (Date.now() < end) {
try { return await fn(); } catch (e) { lastErr = e; await new Promise(r => setTimeout(r, step)); }
}
throw lastErr ?? new Error("timeout");
}
5) What to mock (and what not to)
- Mock: third‑party APIs, email/SMS, payments, clock, randomness, OS/FS where slow.
- Prefer fakes over mocks for your own interfaces (in‑memory repo/queue).
- Don’t mock the code under test; it creates green false positives.
- Snapshot tests sparingly—on stable, human‑reviewable output only.
6) Coverage & confidence
- Track line + branch coverage per package; aim for ~80% line / ~60% branch overall, higher for core logic.
- Use mutation testing (Stryker, mutmut) on critical modules to measure assertion quality.
- Gate merges on changed‑files coverage rather than repo‑wide % to avoid gaming.
7) CI that stays under 10 minutes
- Shard & parallelize by test file; cache dependencies and compiled artifacts.
- Run unit + key integration on PR; run full E2E nightly and on release candidates.
- Retry only flaky E2E (1–2 times) and quarantine repeat offenders.
- Artifacts: E2E videos/screenshots/logs; test results in JUnit format; upload HTML coverage.
- Use Testcontainers with reusable layers or service containers to avoid cold starts.
GitHub Actions (matrix split sketch)
jobs:
test:
strategy: { matrix: { shard: [1,2,3,4] } }
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- run: npm ci
- run: npm run test -- --shard=${{matrix.shard}}/4 --reporter=junit
8) Release safety net (beyond tests)
- Smoke tests post‑deploy (synthetic checks).
- Contract verification in CI for services you call/serve.
- Feature flags for risky paths; dark‑launch + canary.
- Runtime assertions and structured logs to catch invariants you didn’t test.
Pitfalls & fast fixes
| Pitfall | Why it hurts | Fix | |---|---|---| | E2E‑heavy suites | Slow, flaky, costly | Push logic to unit/integration; trim to critical flows | | Global test DB | Hidden coupling, flake | Ephemeral DB or per‑test transaction rollback | | Sleeping in tests | Racy & slow | Wait for signals, not time | | Mocking everything | False confidence | Mock at the edge; use fakes otherwise | | Unseeded randomness/time | Non‑reproducible | Seed & freeze; inject a clock | | Coverage chasing 100% | Busywork | Focus on risk‑based coverage + mutation testing |
Quick checklist
- [ ] Ratios: Unit 60–70%, Integration 20–30%, E2E 5–10%.
- [ ] Containers for real deps in integration tests.
- [ ] Critical E2E only; parallelize and artifact logs/video.
- [ ] Freeze time, seed random, block network by default.
- [ ] Prefer fakes over mocks; contract/component tests where they fit.
- [ ] Keep CI < 10 minutes with sharding and caches.
One‑minute adoption plan
- Label current tests by type; cut or migrate E2E that duplicate unit/integration.
- Add Testcontainers/docker‑compose for DB/queue integration in CI.
- Freeze time and seed randomness in test runners; add a waitFor helper.
- Define a smoke E2E suite (≤ 3 minutes) for PRs; full E2E nightly.
- Track coverage (line+branch) and add mutation testing to core modules.