Secrets Management That Survives Production

March 28th, 202610 min read#dev #security #secrets #ops #kubernetes #ca-duh

Env vars are only the beginning. This covers secret managers, rotation, least privilege, and delivery patterns that do not leak credentials across your stack.

Secrets Management 101

Env vars, secret managers, rotation, and least privilege

Goal: keep credentials out of code, make rotation boring, and ensure apps only get the smallest secret set they actually need.

TL;DR

A secret is anything that grants access or proves identity: passwords, API keys, tokens, signing keys, certificates, webhook secrets.
Use environment variables as the app interface, but avoid making .env files or Kubernetes manifests your long-term source of truth.
In production, prefer a secret manager plus workload identity over hand-managed static credentials.
Rotation is part of the design, not a once-a-year ceremony. If your app can’t reload or swap credentials safely, your setup is incomplete.
Prefer short-lived credentials where possible. Static secrets should be scoped tightly, auditable, and easy to revoke.
Apply least privilege to both humans and services. Most workloads need access to a few secrets, not the whole vault.
Never log plaintext secrets. Redact them in app logs, error pages, traces, crash dumps, and CI output.
For Kubernetes, remember: Secrets are not magic. Encrypt them at rest, scope RBAC tightly, and assume “can create Pod” is powerful.

1) What Counts as a Secret?

People usually think “database password” and stop there.

That’s too narrow.

Secrets include:

database credentials
API keys
OAuth client secrets
JWT signing keys
session encryption keys
webhook signing secrets
private TLS keys
SSH deploy keys
cloud access tokens
third-party service tokens

A good rule:

If someone else got this value, could they impersonate you, read protected data, or spend your money?

If yes, treat it like a secret.

2) The Default Architecture That Ages Well

For most teams, the best default is:

Secret manager stores the canonical value
Workload identity authenticates the app to the secret manager
App receives the secret at runtime via env var, file mount, or startup fetch
Access is logged and scoped by environment, service, and purpose
Rotation updates the secret without a “rewrite half the platform” project

This model gives you:

one place to audit access
one place to rotate values
less chance of secrets living forever in Git, images, or random wiki pages
cleaner separation between config and secret storage

Good mental model

Env vars are how your app reads config
Secret managers are where your sensitive values live
Identity is how the workload proves it should get them

That distinction prevents a lot of pain.

3) Env Vars: Useful Interface, Bad Long-Term Storage

Env vars are still a practical app interface. They are portable, framework-agnostic, and easy to swap between deploys.

But env vars are not a full secrets strategy by themselves.

Where env vars work well

local development
simple deployments
passing a secret to a process at start-up
twelve-factor-style config boundaries

Where they fall short

rotation often needs a restart or redeploy
values leak into process dumps, debug pages, or accidental diagnostics
people start copying them into CI variables, shell history, and shared docs
long-lived secrets stay alive because nobody wants to touch them

Practical default

Use env vars as the last hop into the app, not as the place your team manually curates secrets forever.

Bad:

# copied by hand from Slack six months ago
export STRIPE_SECRET_KEY=sk_live_...

Better:

# startup script fetches from a secret manager and injects into the process
export STRIPE_SECRET_KEY="$(fetch_secret prod/payments/stripe_api_key)"
exec node server.js

Best, when your platform supports it:

the workload authenticates automatically
the platform injects the secret or mounts it
rotation can happen without a human copying values around

4) The Four Delivery Patterns You’ll Actually Use

Pattern A: Inject at deploy/startup

Your deploy system fetches the secret and exposes it to the process as env vars.

Good for: small teams, simple apps, stable credentials
Weakness: rotation usually means restart/redeploy

Pattern B: App fetches on startup

The app uses workload identity to fetch secrets when it boots.

Good for: stronger auditing, less secret sprawl in CI/CD
Weakness: startup depends on secret-manager availability

Minimal pseudocode:

const dbPassword = await secretProvider.get("prod/api/db_password");
const pool = createPool({
  host: process.env.DB_HOST,
  user: process.env.DB_USER,
  password: dbPassword,
});

Pattern C: Sidecar / agent writes to an in-memory file

A helper process fetches and refreshes secrets, then writes them to a memory-backed volume. The app reads from the mounted file.

Good for: Kubernetes, dynamic refresh, language-agnostic apps
Weakness: more moving parts

Example shape:

volumes:
  - name: secrets-vol
    emptyDir:
      medium: Memory
containers:
  - name: app
    volumeMounts:
      - name: secrets-vol
        mountPath: /var/run/secrets/app
        readOnly: true
  - name: secrets-agent
    volumeMounts:
      - name: secrets-vol
        mountPath: /var/run/secrets/app

Pattern D: On-demand fetch with local cache

The app fetches secrets when needed, then caches briefly in memory.

Good for: low-frequency admin secrets, signing material, multi-tenant cases
Weakness: adds latency and dependency complexity if overused

Which one should you pick?

Simple monolith on one platform: inject at startup
Kubernetes app that needs refresh: sidecar/agent or platform CSI integration
Highly regulated / lots of rotation: app fetch or agent-based fetch with audit trails
Very sensitive credentials: prefer short-lived or dynamically issued secrets over static values

5) Secret Managers vs Kubernetes Secrets vs `.env`

These are not interchangeable.

`.env` files

Great for local development. Dangerous when they become your team’s actual production secret database.

Use:

.env.example checked in with placeholders
real .env files ignored by Git
different values per developer and per deploy

Avoid:

emailing .env files around
storing prod secrets in passwordless shared drives
keeping old rotated values “just in case”

Kubernetes Secrets

Useful for wiring secrets into Pods, but they are not a full management layer. They help separate sensitive values from Pod specs, but they still need cluster hardening.

Treat them like a transport and access-control mechanism, not a reason to stop thinking.

Secret managers

This is where the good stuff usually lives:

centralized storage
access control
auditing
rotation support
versioning
sometimes dynamic credentials and short-lived leases

A secret manager plus workload identity is the most future-proof default for production systems.

6) Rotation: If It Hurts, the Design Is Wrong

A secret that can’t be rotated cleanly is a future incident.

Static rotation

You replace a long-lived secret on a schedule or after exposure.

Typical flow:

create a new credential
publish it to the secret manager
update dependent service or database
let apps pick up the new version
revoke the old credential

Dynamic / short-lived credentials

Instead of rotating one long-lived secret, the platform issues credentials that expire automatically.

This is often better because:

smaller blast radius
less value if leaked
fewer “forgot to rotate this for 18 months” moments

Design for rotation up front

Your app should answer these questions clearly:

Can it reload secrets without a full outage?
Does it reopen DB pools cleanly after password rotation?
Can it validate both old and new signing keys during cutover?
Does it fail soft when the secret manager is briefly unavailable?

JWT or cookie signing key rotation example

A common safe pattern is:

sign new tokens with kid=new
continue verifying old tokens with kid=old
wait for old sessions/tokens to age out
remove the old key

Database password rotation gotcha

If your app holds long-lived pooled DB connections, updating the stored password is not enough. You also need a plan for:

refreshing existing pools
retrying auth failures gracefully
draining old workers

Rotation is not “change the value in a console.” It is “change the value and survive the transition.”

7) Least Privilege: Scope by App, Environment, and Purpose

The fastest way to leak secrets internally is to make everything readable by everyone.

Good naming and scoping

prod/api/stripe/secret_key
prod/api/postgres/app_password
prod/worker/s3/upload_signing_key
staging/api/stripe/secret_key

This gives you clear boundaries by:

environment
service
provider
purpose

Access policy example

api-prod workload can read:
- prod/api/postgres/*
- prod/api/stripe/*
worker-prod workload can read:
- prod/worker/s3/*
developers can read staging values when needed
only a small admin group can read or rotate production secrets manually

Practical rules

one secret per purpose when possible
do not share the same API key across unrelated services
split read vs write credentials
split prod vs staging completely
prefer workload identity over shared human-managed credentials

If one service is compromised, you want the attacker to get one narrow credential, not your whole estate.

8) CI/CD: Where Secrets Leak Quietly

A lot of teams lock down production but then spray secrets across pipelines.

High-risk places:

build logs
test output
shell tracing (set -x)
pull request previews
artifact metadata
copied secrets in pipeline variables that never expire

Safer CI/CD defaults

use the CI platform’s secret store for pipeline-only values
use OIDC/workload identity where supported instead of long-lived cloud keys
never echo secrets
mask sensitive variables in logs
scope pipeline credentials per repo and environment
expire temporary credentials aggressively

Bad:

export AWS_SECRET_ACCESS_KEY=AKIA...
terraform apply

Better:

# pipeline obtains short-lived cloud credentials from workload identity / OIDC
terraform apply

The best secret is often the one you never had to store in the pipeline at all.

9) Local Development Without Chaos

Developers still need to run the app. That does not mean every laptop needs production secrets.

Good local-dev setup

.env.example with placeholders and comments
local-only API keys and dev databases
bootstrap script for fetching allowed non-prod secrets
separate dev/staging tenants for third-party services

Example:

cp .env.example .env.local
./scripts/fetch-dev-secrets.sh
npm run dev

Don’t do this

share one production .env file with the entire team
let support staff or interns use prod API keys locally
point local dev at production databases unless there is a very specific, audited reason

A boring dev experience beats a “clever” secret flow no one understands.

10) Logging, Monitoring, and Memory Hygiene

Your secret handling is only as good as your worst log line.

Never log

raw tokens
passwords
Authorization headers
webhook secrets
JWT signing keys
full connection strings with embedded passwords

Redact aggressively

function redact(value: string) {
  if (!value) return value;
  return value.length <= 8 ? "[REDACTED]" : `${value.slice(0, 4)}...[REDACTED]`;
}

Other leak paths people forget

error pages
debug endpoints
crash dumps
APM spans
browser devtools when secrets are exposed client-side
shell history
support screenshots

Memory reality check

Do not over-romanticize “perfect” secret handling in memory. For most teams, the biggest wins are still:

shorter lifetime
tighter access
less logging
easier revocation

If you are in a highly sensitive environment, then memory handling, file permissions, dump control, and host isolation matter more. But most incidents still start with hardcoded, over-shared, or unrotated secrets.

11) Kubernetes: The Sharp Edges

Kubernetes Secrets are useful, but a lot of teams assume the word “Secret” means “solved.”

It doesn’t.

Important realities

base64 is encoding, not protection
access to create Pods in a namespace is powerful
broad list or watch permissions on Secrets are risky
if etcd encryption is not configured, you are leaving value on the table

Safer k8s defaults

enable encryption at rest for Secrets
use least-privilege RBAC
prefer mounted files over printing values into startup scripts
use separate namespaces and service accounts per app
avoid baking secrets into images or ConfigMaps

Example Pod env reference:

env:
  - name: DATABASE_PASSWORD
    valueFrom:
      secretKeyRef:
        name: api-db
        key: password

Mounted-file pattern:

volumeMounts:
  - name: db-secret
    mountPath: /var/run/secrets/db
    readOnly: true
volumes:
  - name: db-secret
    secret:
      secretName: api-db

Mounted files are often easier to rotate than env vars because many runtimes only read env vars once at startup.

12) Incident Runbook: A Secret Was Exposed

This is where teams discover whether they have a system or just vibes.

Step 1: Identify blast radius

what secret leaked?
where was it exposed?
which systems trust it?
was it read-only, write-capable, or admin-level?

Step 2: Revoke or disable

Do not start with a postmortem. Start with containment.

Step 3: Rotate and redeploy safely

create replacement
update the source of truth
reload or restart dependent workloads
verify health checks and access paths

Step 4: Scrub obvious exposure paths

logs
build output
dashboards
screenshots
pasted chat messages
accidentally committed files

Step 5: Audit usage

Look for:

access from unusual IPs or regions
suspicious API calls
failed auth spikes
unexpected resource creation or data access

Step 6: Fix the system, not just the value

Ask:

why did this secret exist that long?
why was it reachable there?
why was rotation hard?
why was it visible in logs or Git?

If the answer is “human process,” improve the tooling until the safer path is the easier one.

13) Common Pitfalls & How to Avoid Them

Hardcoding secrets in code: use env vars or mounted files as the last hop.
One shared credential for everything: scope by app and purpose.
No rotation plan: document cutover steps before production.
Prod secrets on developer laptops: use non-prod values and fetch tooling.
Secrets in CI logs: mask, redact, and stop echoing variables.
Kubernetes Secrets treated like a vault: harden etcd and RBAC, or use an external manager.
App restart required for every rotation: prefer file mounts, agent refresh, or reload hooks.
Long-lived cloud keys in pipelines: move to OIDC/workload identity.
Secrets embedded in URLs: they will leak into logs, analytics, and referrers.
No audit trail: centralize access and watch who requested what.

14) What “Good” Looks Like

Developers can run the app locally without production secrets.
Every production secret has a clear owner, purpose, and scope.
Services authenticate to the secret manager using identity, not pasted static credentials.
Rotation is documented, tested, and boring.
Access is auditable by workload and human.
Logs, traces, and dashboards do not spray secret values everywhere.
A junior engineer can follow the exposure runbook at 2 AM.

Appendix: Starter Checklist

Minimum viable secrets hygiene

[ ] no secrets in Git
[ ] no secrets in container images
[ ] no prod secrets in .env.example
[ ] secret manager for production
[ ] least-privilege access per service
[ ] encryption at rest where supported
[ ] rotation procedure documented
[ ] secret exposure incident runbook written
[ ] logs and traces redact sensitive values
[ ] CI/CD uses temporary credentials where possible

Secrets management is not about hiding strings in more creative places. It is about designing a system where access is narrow, rotation is routine, and leaks are contained fast.