Connection Pooling for Databases
Sizing, timeouts, and “too many clients” fixes
Goal: keep latency low and connections under control across multiple app instances, without starving the database or your users.
TL;DR
- Pool per process and reuse connections. Never open a fresh connection per request.
- Right-size by budgeting DB connections across all replicas; for Postgres put PgBouncer in front.
- Set timeouts everywhere: connect, acquire-from-pool, statement, and idle-in-transaction.
- “Too many clients” fixes: find the hogs, kill long idle in transaction, cap app pools, and add a pooler proxy.
- Beware prepared statements with transaction pooling—disable prepares or switch to session pooling.
1) What Pooling Solves (and What It Doesn’t)
Solves: connection setup overhead, connection storms, and uneven load across many app instances.
Doesn’t: fix slow queries, missing indexes, or long transactions. Pooling amplifies bad behavior unless you set timeouts and short transactions.
2) Sizing Your Pools (Server & App)
2.1 Budget database connections
Postgres
SHOW max_connections; -- total allowed server backends
SELECT count(*) FROM pg_stat_activity; -- current
MySQL/MariaDB
SHOW VARIABLES LIKE 'max_connections';
SHOW STATUS LIKE 'Threads_connected';
Reserve headroom for maintenance/superuser and replicas (e.g., 20–30 connections or 10%, whichever is larger).
2.2 Compute per‑service pool sizes
Let:
DB_MAX= server max connections (e.g., 300)RESERVE= admin/maintenance headroom (e.g., 30)APPS= total app instances across your fleet that talk to this DB (e.g., 12 pods)POOL_PER_INSTANCE ≈ floor((DB_MAX - RESERVE) / APPS)
Rule of thumb for OLTP: Keep concurrent active queries ≈ 2–4× CPU cores on the DB. If your math exceeds that, your pools are too big—reduce
POOL_PER_INSTANCEor add a pooler proxy to multiplex.
Example
DB_MAX=300,RESERVE=30,APPS=12→POOL_PER_INSTANCE=floor(270/12)=22.
Start with 15–20 per pod, not 100.
2.3 Use a proxy to multiplex (Postgres)
Put PgBouncer between apps and Postgres to turn many client connections into a smaller server pool:
; /etc/pgbouncer/pgbouncer.ini
[databases]
appdb = host=postgres port=5432 dbname=appdb
[pgbouncer]
listen_port = 6432
pool_mode = transaction ; or session (see prepared stmts note)
default_pool_size = 50 ; per db+user server conns
min_pool_size = 5
reserve_pool_size = 5
max_client_conn = 2000
server_idle_timeout = 30s
server_lifetime = 2h
query_wait_timeout = 5s
ignore_startup_parameters = extra_float_digits
Prepared statements & transaction pooling: With pool_mode=transaction, a session can hop servers—server‑side prepared statements break. Disable prepares in the client (e.g., Prisma ?pgbouncer=true, node‑pg simple_query_mode: true, psycopg3 prepare_threshold=0) or switch to session pooling.
3) Timeouts That Prevent Meltdowns
App‑level (client driver)
- Connect timeout: fail fast if the DB/proxy is down (e.g., 5s).
- Acquire/pool timeout: how long to wait for a free connection (e.g., 5s).
- Statement/query timeout: kill long queries (OLTP baseline 30s; many apps use 5–15s).
- Connection lifetime / recycle: close before NAT/LB idle timeouts (30m) to avoid broken pipes.
Server‑level (Postgres)
ALTER ROLE appuser SET statement_timeout = '30s';
ALTER ROLE appuser SET idle_in_transaction_session_timeout = '60s';
ALTER ROLE appuser SET idle_session_timeout = '30min';
-- Optional: lock timeouts to fail fast on lock waits
ALTER ROLE appuser SET lock_timeout = '5s';
Server‑level (MySQL/MariaDB)
SET GLOBAL wait_timeout = 1800; -- seconds for noninteractive
SET GLOBAL interactive_timeout = 1800;
SET GLOBAL lock_wait_timeout = 5;
Tune pool lifetimes to be less than LB/NAT idle (AWS NLB, Cloudflare Tunnel, k8s SNAT). Add TCP keepalives if needed.
4) App Code: Correct Pool Usage
Node.js (node‑pg)
import { Pool } from "pg";
export const pool = new Pool({
connectionString: process.env.DATABASE_URL, // via PgBouncer is ideal
max: parseInt(process.env.POOL_MAX ?? "20", 10),
idleTimeoutMillis: 30_000,
connectionTimeoutMillis: 5_000,
statement_timeout: 30_000,
application_name: "api",
// simple_query_mode: true, // enable if using PgBouncer transaction pooling
});
// Pattern: borrow/release
export async function getUser(id: number) {
const client = await pool.connect();
try {
const { rows } = await client.query("SELECT * FROM users WHERE id = $1", [id]);
return rows[0];
} finally {
client.release(); // always release in finally
}
}
Python (SQLAlchemy + psycopg/psycopg2/3)
from sqlalchemy import create_engine, text
engine = create_engine(
"postgresql+psycopg://user:pass@pgbouncer:6432/appdb",
pool_size=int(os.getenv("POOL_SIZE", 10)),
max_overflow=int(os.getenv("POOL_OVERFLOW", 10)),
pool_timeout=5, # seconds to wait for a free conn
pool_recycle=1800, # seconds (conn lifetime)
pool_pre_ping=True, # drops dead conns
connect_args={
"application_name": "api",
"options": "-c statement_timeout=30s -c idle_in_transaction_session_timeout=60s",
# psycopg3: "prepare_threshold": 0 # disable prepares for txn pooling
},
)
with engine.begin() as conn:
conn.execute(text("SELECT 1"))
Go (database/sql + pgx/mysql)
db, err := sql.Open("pgx", os.Getenv("DATABASE_URL")) // via PgBouncer
if err != nil { log.Fatal(err) }
db.SetMaxOpenConns(20) // hard cap
db.SetMaxIdleConns(10)
db.SetConnMaxLifetime(30 * time.Minute)
db.SetConnMaxIdleTime(5 * time.Minute)
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := db.PingContext(ctx); err != nil { log.Fatal(err) }
5) “Too Many Clients/Connections” — Fast Fix Runbook (Postgres)
- See who’s connected
SELECT application_name, usename, state, count(*) AS conns
FROM pg_stat_activity
GROUP BY 1,2,3 ORDER BY conns DESC;
SELECT state, count(*) FROM pg_stat_activity GROUP BY 1;
- Spot leaks & long transactions
SELECT pid, usename, state, now() - xact_start AS xact_age, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY xact_age DESC;
- Terminate worst offenders (safely)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction' AND now() - xact_start > interval '5 minutes';
- Cap pools & add a proxy
- Reduce app
POOL_MAXacross services; deploy PgBouncer. - Set role defaults for
statement_timeoutandidle_in_transaction_session_timeout. - Add backoff + jitter on connect retries to avoid connection storms.
- Find slow/blocked queries
SELECT pid, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE wait_event_type IS NOT NULL;
If locks block, add lock_timeout and fix transaction order/length.
MySQL equivalents:
SHOW PROCESSLIST;
SHOW VARIABLES LIKE 'max_connections';
SHOW STATUS LIKE 'Threads_connected';
KILL <id>;
6) Kubernetes & Multi‑Instance Math
- Total potential connections =
sum(POOL_PER_INSTANCE * replicas)for all services hitting this DB. - Keep this < DB_MAX - RESERVE.
- Readiness/Liveness probes: don’t open new TCP + auth each time. Probe the app and let the app check the DB via existing pool with a short statement timeout.
Example Helm values:
env:
- name: POOL_MAX
value: "18"
- name: DATABASE_URL
valueFrom: { secretKeyRef: { name: app, key: DATABASE_URL } }
7) Timeouts Cheat‑Sheet (Defaults to start from)
| Layer | Setting | Start with |
|---|---|---|
| Client connect | connectionTimeoutMillis / connect_timeout | 5s |
| Acquire from pool | pool_timeout / acquire timeout | 5s |
| Statement | statement_timeout | 30s (OLTP), 60–120s (reports) |
| Idle in transaction | idle_in_transaction_session_timeout | 60s |
| Idle session | idle_session_timeout / wait_timeout | 15–60m |
| Conn lifetime | pool_recycle / SetConnMaxLifetime | 30m |
| PgBouncer wait | query_wait_timeout | 5s |
8) Common Pitfalls & How to Avoid Them
- Per‑request connect: always use a process‑wide pool.
- Huge pools: more connections ≠ more throughput; you’ll thrash on context switches and locks.
- Long transactions: they block autovacuum (Postgres) and bloat tables; keep transactions short.
- Prepared statements + transaction pooling: disable prepares or use session mode.
- No app name: set
application_nameto identify hogs quickly. - No backoff: exponential backoff + jitter on connect/query retries to prevent thundering herds.
9) What “Good” Looks Like
- Stable p50/p95 latencies under load, even during deploys.
- DB CPU near target, not spiking with connection storms.
- Clear dashboards: connections by app, pool saturation, timeouts, lock waits.
- Runbook that junior devs can follow at 2 AM.
Appendix: Quick PgBouncer Docker
docker run -p 6432:6432 -e DB_HOST=postgres -e DB_NAME=appdb -e DB_USER=app -e DB_PASS=secret edoburu/pgbouncer:latest
Mount a config with your pgbouncer.ini and auth file (userlist.txt).
Ship it with small pools, strict timeouts, and one proxy in front. Your future self will sleep better.