Connection Pooling for Databases — sizing, timeouts, and “too many clients” fixes

November 1st, 20256 min read#dev #databases #postgres #mysql #ops #performance #ca-duh

A production-focused guide to right-sizing pools, setting timeouts that prevent stalls, and fixing Postgres/MySQL “too many clients/connections” errors with app and proxy patterns.

Connection Pooling for Databases

Sizing, timeouts, and “too many clients” fixes

Goal: keep latency low and connections under control across multiple app instances, without starving the database or your users.

TL;DR

Pool per process and reuse connections. Never open a fresh connection per request.
Right-size by budgeting DB connections across all replicas; for Postgres put PgBouncer in front.
Set timeouts everywhere: connect, acquire-from-pool, statement, and idle-in-transaction.
“Too many clients” fixes: find the hogs, kill long idle in transaction, cap app pools, and add a pooler proxy.
Beware prepared statements with transaction pooling—disable prepares or switch to session pooling.

1) What Pooling Solves (and What It Doesn’t)

Solves: connection setup overhead, connection storms, and uneven load across many app instances.
Doesn’t: fix slow queries, missing indexes, or long transactions. Pooling amplifies bad behavior unless you set timeouts and short transactions.

2) Sizing Your Pools (Server & App)

2.1 Budget database connections

Postgres

SHOW max_connections;            -- total allowed server backends
SELECT count(*) FROM pg_stat_activity;  -- current

MySQL/MariaDB

SHOW VARIABLES LIKE 'max_connections';
SHOW STATUS LIKE 'Threads_connected';

Reserve headroom for maintenance/superuser and replicas (e.g., 20–30 connections or 10%, whichever is larger).

2.2 Compute per‑service pool sizes

Let:

DB_MAX = server max connections (e.g., 300)
RESERVE = admin/maintenance headroom (e.g., 30)
APPS = total app instances across your fleet that talk to this DB (e.g., 12 pods)
POOL_PER_INSTANCE ≈ floor((DB_MAX - RESERVE) / APPS)

Rule of thumb for OLTP: Keep concurrent active queries ≈ 2–4× CPU cores on the DB. If your math exceeds that, your pools are too big—reduce POOL_PER_INSTANCE or add a pooler proxy to multiplex.

Example

DB_MAX=300, RESERVE=30, APPS=12 → POOL_PER_INSTANCE=floor(270/12)=22.
Start with 15–20 per pod, not 100.

2.3 Use a proxy to multiplex (Postgres)

Put PgBouncer between apps and Postgres to turn many client connections into a smaller server pool:

; /etc/pgbouncer/pgbouncer.ini
[databases]
appdb = host=postgres port=5432 dbname=appdb

[pgbouncer]
listen_port = 6432
pool_mode = transaction        ; or session (see prepared stmts note)
default_pool_size = 50         ; per db+user server conns
min_pool_size = 5
reserve_pool_size = 5
max_client_conn = 2000
server_idle_timeout = 30s
server_lifetime = 2h
query_wait_timeout = 5s
ignore_startup_parameters = extra_float_digits

Prepared statements & transaction pooling: With pool_mode=transaction, a session can hop servers—server‑side prepared statements break. Disable prepares in the client (e.g., Prisma ?pgbouncer=true, node‑pg simple_query_mode: true, psycopg3 prepare_threshold=0) or switch to session pooling.

3) Timeouts That Prevent Meltdowns

App‑level (client driver)

Connect timeout: fail fast if the DB/proxy is down (e.g., 5s).
Acquire/pool timeout: how long to wait for a free connection (e.g., 5s).
Statement/query timeout: kill long queries (OLTP baseline 30s; many apps use 5–15s).
Connection lifetime / recycle: close before NAT/LB idle timeouts (30m) to avoid broken pipes.

Server‑level (Postgres)

ALTER ROLE appuser SET statement_timeout = '30s';
ALTER ROLE appuser SET idle_in_transaction_session_timeout = '60s';
ALTER ROLE appuser SET idle_session_timeout = '30min';
-- Optional: lock timeouts to fail fast on lock waits
ALTER ROLE appuser SET lock_timeout = '5s';

Server‑level (MySQL/MariaDB)

SET GLOBAL wait_timeout = 1800;          -- seconds for noninteractive
SET GLOBAL interactive_timeout = 1800;
SET GLOBAL lock_wait_timeout = 5;

Tune pool lifetimes to be less than LB/NAT idle (AWS NLB, Cloudflare Tunnel, k8s SNAT). Add TCP keepalives if needed.

4) App Code: Correct Pool Usage

Node.js (node‑pg)

import { Pool } from "pg";
export const pool = new Pool({
  connectionString: process.env.DATABASE_URL,  // via PgBouncer is ideal
  max: parseInt(process.env.POOL_MAX ?? "20", 10),
  idleTimeoutMillis: 30_000,
  connectionTimeoutMillis: 5_000,
  statement_timeout: 30_000,
  application_name: "api",
  // simple_query_mode: true, // enable if using PgBouncer transaction pooling
});

// Pattern: borrow/release
export async function getUser(id: number) {
  const client = await pool.connect();
  try {
    const { rows } = await client.query("SELECT * FROM users WHERE id = $1", [id]);
    return rows[0];
  } finally {
    client.release(); // always release in finally
  }
}

Python (SQLAlchemy + psycopg/psycopg2/3)

from sqlalchemy import create_engine, text

engine = create_engine(
    "postgresql+psycopg://user:pass@pgbouncer:6432/appdb",
    pool_size=int(os.getenv("POOL_SIZE", 10)),
    max_overflow=int(os.getenv("POOL_OVERFLOW", 10)),
    pool_timeout=5,              # seconds to wait for a free conn
    pool_recycle=1800,           # seconds (conn lifetime)
    pool_pre_ping=True,          # drops dead conns
    connect_args={
        "application_name": "api",
        "options": "-c statement_timeout=30s -c idle_in_transaction_session_timeout=60s",
        # psycopg3: "prepare_threshold": 0  # disable prepares for txn pooling
    },
)

with engine.begin() as conn:
    conn.execute(text("SELECT 1"))

Go (database/sql + pgx/mysql)

db, err := sql.Open("pgx", os.Getenv("DATABASE_URL")) // via PgBouncer
if err != nil { log.Fatal(err) }

db.SetMaxOpenConns(20)         // hard cap
db.SetMaxIdleConns(10)
db.SetConnMaxLifetime(30 * time.Minute)
db.SetConnMaxIdleTime(5 * time.Minute)

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := db.PingContext(ctx); err != nil { log.Fatal(err) }

5) “Too Many Clients/Connections” — Fast Fix Runbook (Postgres)

See who’s connected

SELECT application_name, usename, state, count(*) AS conns
FROM pg_stat_activity
GROUP BY 1,2,3 ORDER BY conns DESC;

SELECT state, count(*) FROM pg_stat_activity GROUP BY 1;

Spot leaks & long transactions

SELECT pid, usename, state, now() - xact_start AS xact_age, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY xact_age DESC;

Terminate worst offenders (safely)

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction' AND now() - xact_start > interval '5 minutes';

Cap pools & add a proxy

Reduce app POOL_MAX across services; deploy PgBouncer.
Set role defaults for statement_timeout and idle_in_transaction_session_timeout.
Add backoff + jitter on connect retries to avoid connection storms.

Find slow/blocked queries

SELECT pid, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE wait_event_type IS NOT NULL;

If locks block, add lock_timeout and fix transaction order/length.

MySQL equivalents:

SHOW PROCESSLIST;
SHOW VARIABLES LIKE 'max_connections';
SHOW STATUS LIKE 'Threads_connected';
KILL <id>;

6) Kubernetes & Multi‑Instance Math

Total potential connections = sum(POOL_PER_INSTANCE * replicas) for all services hitting this DB.
Keep this < DB_MAX - RESERVE.
Readiness/Liveness probes: don’t open new TCP + auth each time. Probe the app and let the app check the DB via existing pool with a short statement timeout.

Example Helm values:

env:
  - name: POOL_MAX
    value: "18"
  - name: DATABASE_URL
    valueFrom: { secretKeyRef: { name: app, key: DATABASE_URL } }

7) Timeouts Cheat‑Sheet (Defaults to start from)

| Layer | Setting | Start with | |---|---|---| | Client connect | connectionTimeoutMillis / connect_timeout | 5s | | Acquire from pool | pool_timeout / acquire timeout | 5s | | Statement | statement_timeout | 30s (OLTP), 60–120s (reports) | | Idle in transaction | idle_in_transaction_session_timeout | 60s | | Idle session | idle_session_timeout / wait_timeout | 15–60m | | Conn lifetime | pool_recycle / SetConnMaxLifetime | 30m | | PgBouncer wait | query_wait_timeout | 5s |

8) Common Pitfalls & How to Avoid Them

Per‑request connect: always use a process‑wide pool.
Huge pools: more connections ≠ more throughput; you’ll thrash on context switches and locks.
Long transactions: they block autovacuum (Postgres) and bloat tables; keep transactions short.
Prepared statements + transaction pooling: disable prepares or use session mode.
No app name: set application_name to identify hogs quickly.
No backoff: exponential backoff + jitter on connect/query retries to prevent thundering herds.

9) What “Good” Looks Like

Stable p50/p95 latencies under load, even during deploys.
DB CPU near target, not spiking with connection storms.
Clear dashboards: connections by app, pool saturation, timeouts, lock waits.
Runbook that junior devs can follow at 2 AM.

Appendix: Quick PgBouncer Docker

docker run -p 6432:6432   -e DB_HOST=postgres -e DB_NAME=appdb -e DB_USER=app -e DB_PASS=secret   edoburu/pgbouncer:latest

Mount a config with your pgbouncer.ini and auth file (userlist.txt).

Ship it with small pools, strict timeouts, and one proxy in front. Your future self will sleep better.