TL;DR
- Use clear HTTP status codes with one canonical error shape across all endpoints.
- Prefer Problem Details (RFC 7807); JSend is fine if you already use it—just be consistent.
- Include actionable fields:
code,message,details,trace_id,docslink, and where useful ahintorerrors[]per‑field. - Log structured JSON with correlation IDs; never leak secrets. Sample noisy errors; page only on SLO‑relevant ones.
1) Status codes that pull their weight
| Situation | Status | Notes |
|---|---|---|
| Success | 200 OK, 201 Created, 202 Accepted, 204 No Content | 201 for new resources with Location header |
| Client error (bad input) | 400 Bad Request | Invalid JSON, schema violation (or use 422) |
| Validation errors | 422 Unprocessable Content | Body is well‑formed but semantically invalid |
| Authn/Authz | 401 Unauthorized, 403 Forbidden | 401 ⇒ unauthenticated; 403 ⇒ authenticated but not allowed |
| Not found | 404 Not Found | Don’t leak existence in sensitive contexts |
| Method/Media | 405 Method Not Allowed, 415 Unsupported Media Type, 406 Not Acceptable | |
| Conflict | 409 Conflict | Version mismatch, unique constraint conflicts |
| Rate limiting | 429 Too Many Requests | Add Retry-After (seconds or HTTP date) |
| Upstream timeout | 504 Gateway Timeout | Client can retry |
| Server overload/maintenance | 503 Service Unavailable | Add Retry-After; keep brief |
| Generic server error | 500 Internal Server Error | Catch‑all; instrument and fix root cause |
Pick between 400 vs 422 and stick to it. Many teams use 422 for field‑level validation because 400 often means malformed payload.
2) One consistent error format
Option A — Problem Details (RFC 7807)
HTTP/1.1 422 Unprocessable Content
Content-Type: application/problem+json
Traceparent: 00-2c3e...-9f7c...-01
{
"type": "https://docs.example.com/problems/validation-error",
"title": "Your request parameters didn't validate",
"status": 422,
"detail": "The 'email' field must be a valid address.",
"instance": "/users",
"trace_id": "01HZX2C9Q5C3PZ3M1E0K7",
"errors": [
{ "field": "email", "code": "invalid_email", "message": "Must be a valid email" },
{ "field": "age", "code": "min", "message": "Must be at least 18" }
]
}
typeis a URI for documentation (can be per‑class of problem).instancecan echo the request path or a unique occurrence id.- Add custom members (
trace_id,errors) as needed.
Option B — JSend (if that’s your house style)
{
"status": "fail",
"data": {
"errors": [
{ "field": "email", "code": "invalid_email", "message": "Must be a valid email" }
]
},
"code": "VALIDATION_ERROR",
"trace_id": "01HZX2..."
}
Whatever you choose, document it once and reuse everywhere (including 401/403/404/500/503/429).
3) Headers that help clients (and you)
- Correlation: include/propagate
Traceparent(W3C) and echo anX-Request-Idor your owntrace_idin the body. - Retry guidance:
Retry-Afteron 429/503; set expectation that non‑idempotent operations may fail even after retry. - Caching:
Cache-Control: no-storeon sensitive error bodies. - Versioning:
Content-Typeshould be explicit (application/problem+json).
4) Logging & observability (production‑grade)
Log shape (JSON)
{
"ts": "2025-09-07T10:20:30.123Z",
"level": "ERROR",
"service": "users-api",
"method": "POST",
"path": "/users",
"status": 422,
"duration_ms": 37,
"client_ip": "203.0.113.10",
"user_id": "u_123",
"tenant_id": "t_acme",
"trace_id": "01HZX2C9Q5C3...",
"error": { "type": "validation", "code": "invalid_email", "message": "Must be a valid email" }
}
Principles
- Structured logs only; no free‑form stack traces without fields.
- Redact secrets/PII; never log full tokens or passwords.
- Sample noisy 4xx; always log 5xx.
- Emit metrics (
requests_total{status},errors_total{code},latency_bucket) and traces (OpenTelemetry). - Define SLOs (e.g., 99.9% non‑5xx) and alert on error budgets, not every spike.
5) Tiny middleware examples
Express (Node)
// Error shape
class ApiError extends Error {
status = 500;
code = "INTERNAL_ERROR";
details?: unknown;
constructor(status: number, code: string, message: string, details?: unknown) {
super(message); this.status = status; this.code = code; this.details = details;
}
}
// Central handler
app.use((err, req, res, next) => {
const traceId = req.headers["x-request-id"] || crypto.randomUUID();
const status = err.status ?? 500;
const body = {
type: "https://docs.example.com/problems/" + (err.code || "internal-error"),
title: err.message || "Internal Server Error",
status,
trace_id: traceId,
...(err.details ? { errors: err.details } : {})
};
// log (structured)
logger.error({ trace_id: traceId, status, path: req.path, error: { code: err.code, message: err.message } });
res.setHeader("Content-Type", "application/problem+json");
res.status(status).json(body);
});
FastAPI (Python)
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
app = FastAPI()
@app.exception_handler(Exception)
async def problem_details_handler(request: Request, exc: Exception):
trace_id = request.headers.get("x-request-id") or uuid4().hex
status = getattr(exc, "status_code", 500)
body = {
"type": f"https://docs.example.com/problems/{getattr(exc, 'code', 'internal-error')}",
"title": str(exc) or "Internal Server Error",
"status": status,
"trace_id": trace_id
}
# TODO: structured log here
return JSONResponse(content=body, status_code=status, media_type="application/problem+json")
6) Validation & per‑field errors
- Validate early (schema/DTO) and return all field issues in one response.
- Include
errors[]withfield,code,message, and optionalhint(e.g., allowed range). - Keep messages developer‑friendly (end‑users get localized messages elsewhere).
7) Retries & idempotency
- Safe methods (
GET,HEAD) are idempotent; clients may retry on network errors/5xx. - For mutating endpoints, support idempotency keys (header like
Idempotency-Key) to avoid duplicate side effects on retry. - Back off with exponential + jitter; don’t rely on clients to be polite—enforce with 429.
8) Documentation (one page to rule them all)
- Document the error envelope once (Problem Details/JSend).
- In OpenAPI, reference a shared schema for errors and link named
typeURIs:
components:
schemas:
ProblemDetails:
type: object
required: [type, title, status]
properties:
type: { type: string, format: uri }
title: { type: string }
status: { type: integer }
detail: { type: string }
instance: { type: string }
trace_id: { type: string }
errors:
type: array
items:
type: object
properties:
field: { type: string }
code: { type: string }
message: { type: string }
9) Pitfalls & fast fixes
| Pitfall | Why it hurts | Fix |
|---|---|---|
| Inconsistent shapes per endpoint | Clients need custom parsers | One schema; SDKs parse once |
| Returning stacks to clients | Leaks internals/PII | Log stacks; send clean messages only |
| Overusing 500/400 for everything | No signal | Map to correct codes; add code/type |
| Missing correlation IDs | Hard to debug | Propagate Traceparent/X-Request-Id |
| Noisy alerts on 4xx | Pager fatigue | Alert on 5xx & SLOs; sample 4xx |
| Vague validation errors | Support tickets | Include errors[] with field codes & hints |
10) Quick checklist
- [ ] Pick Problem Details or JSend; document and enforce.
- [ ] Map the 10–12 status codes you’ll use; lint in CI.
- [ ] Add correlation IDs and structured logs everywhere.
- [ ] Return field‑level errors; keep messages action‑oriented.
- [ ] Use Retry‑After on 429/503; support idempotency keys.
- [ ] Define SLOs & alerts for 5xx; sample the rest.
One‑minute adoption plan
- Add a shared error response to your OpenAPI and generate clients.
- Implement a global error handler (middleware) that emits Problem Details with
trace_id. - Centralize logging (structured JSON) and propagate Traceparent.
- Review endpoints; fix status codes and add
Retry-Afterwhere needed. - Document the error model; link
typeURIs to a single help page.