TL;DR
- Pods are cattle, not pets. Deploy with Deployments/Jobs, front with a Service, and expose via Ingress/Gateway.
- Always set resources (
requests+limits) and probes (readiness/liveness/startup). That’s your SLO baseline. - Keep config out of images: ConfigMap/Secret → env or mounted files. Never bake secrets.
- Use rolling updates +
kubectl rolloutto watch and undo quickly. - Scale with HPA (needs metrics‑server). Treat state: prefer managed DBs; if you must run storage, use PVC + StatefulSet.
- Lock it down: Namespaces, RBAC, NetworkPolicies, and non‑root containers.
- Debug with logs/exec/port‑forward/describe/events; when stuck, use ephemeral containers via
kubectl debug.
1) Mental model (what talks to what)
Client → Ingress / Gateway → Service (ClusterIP) → Pods (via labels)
└── ConfigMap/Secret mounted into Pods
└── HPA scales Deployment by metrics
Persistent data → PVC → StorageClass (provisioner)
Core objects you’ll touch: Namespace, Deployment, Service, Ingress/Gateway,
ConfigMap, Secret, HorizontalPodAutoscaler, Job/CronJob, StatefulSet (when needed),
PersistentVolumeClaim, NetworkPolicy, PodDisruptionBudget.
2) Minimal “hello web” (Deployment + Service)
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
labels: { app: web }
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate: { maxSurge: 1, maxUnavailable: 0 } # zero‑downtime intent
selector: { matchLabels: { app: web } }
template:
metadata: { labels: { app: web } }
spec:
terminationGracePeriodSeconds: 30
containers:
- name: app
image: ghcr.io/acme/web:1.2.3
ports: [ { containerPort: 8080, name: http } ]
resources:
requests: { cpu: "100m", memory: "128Mi" }
limits: { cpu: "500m", memory: "256Mi" }
readinessProbe:
httpGet: { path: /healthz, port: http }
periodSeconds: 5
livenessProbe:
httpGet: { path: /livez, port: http }
initialDelaySeconds: 20
periodSeconds: 10
envFrom:
- configMapRef: { name: web-config }
- secretRef: { name: web-secrets }
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
---
apiVersion: v1
kind: Service
metadata:
name: web
spec:
selector: { app: web }
ports: [ { name: http, port: 80, targetPort: http } ] # ClusterIP default
Ingress (classic)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "1m"
spec:
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend: { service: { name: web, port: { number: 80 } } }
Newer clusters may support Gateway API as a replacement for some Ingress use cases; the Service still fronts Pods.
3) Config & secrets (don’t bake them in)
apiVersion: v1
kind: ConfigMap
metadata: { name: web-config }
data:
APP_ENV: "prod"
FEATURE_X: "true"
---
apiVersion: v1
kind: Secret
metadata: { name: web-secrets }
type: Opaque
stringData:
DATABASE_URL: "postgres://user:pass@db:5432/app"
Mount as env (shown above) or as files:
volumeMounts:
- name: cfg
mountPath: /etc/web
volumes:
- name: cfg
configMap: { name: web-config }
4) Scaling (HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: web }
spec:
scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: web }
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Requires metrics-server on the cluster. You can also scale on memory or custom/external metrics.
5) Storage (PVC + StatefulSet when you must)
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: data-web }
spec:
accessModes: [ "ReadWriteOnce" ]
resources: { requests: { storage: 5Gi } }
storageClassName: standard
For databases/queues, prefer managed services. If you must run stateful workloads, use StatefulSet + PVC and understand backup/restore and node‑failure behavior.
6) Zero‑downtime & safety nets
- Readiness must flip true only when traffic is safe; liveness restarts stuck Pods.
- Handle SIGTERM on shutdown; honor
terminationGracePeriodSeconds. - Control rollout with
maxSurge/maxUnavailable; watch with:kubectl rollout status deploy/web kubectl rollout undo deploy/web - Prevent voluntary eviction outages:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: web-pdb }
spec:
minAvailable: 1
selector: { matchLabels: { app: web } }
7) Networking quick facts
- Service types:
ClusterIP(internal),NodePort(debug),LoadBalancer(cloud LB). - DNS:
web.default.svc.cluster.localresolves to Service → kube‑proxy → Pod. - Port‑forward for local debug:
kubectl port-forward svc/web 8080:80
8) Security essentials (day‑1)
- Run as non‑root, read‑only FS; drop capabilities (
ALLby default, add minimal needed). - Pin images (avoid
:latest); use imagePullPolicy: IfNotPresent unless debugging. - RBAC least privilege; bind specific
ServiceAccountto your Deployment. Disable automount of service account tokens when not needed. - NetworkPolicy default‑deny + allow necessary egress/ingress.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny }
spec:
podSelector: {}
policyTypes: ["Ingress","Egress"]
- Avoid
hostPathvolumes; don’t run privileged Pods unless you know why.
9) Everyday kubectl (copy‑paste)
# Contexts & namespaces
kubectl config get-contexts
kubectl config set-context --current --namespace=prod
# Explore
kubectl get pods -owide -l app=web
kubectl describe pod <name>
kubectl get events --sort-by=.lastTimestamp
# Logs & exec
kubectl logs deploy/web -f --all-containers
kubectl exec -it deploy/web -- sh
# Debug
kubectl debug deploy/web -it --image=busybox --target=app # ephemeral container
kubectl port-forward svc/web 8080:80
# Explain API
kubectl explain deploy.spec.template.spec.containers.resources
Pitfalls & fast fixes
| Pitfall | Why it bites | Fix |
|---|---|---|
| No requests/limits | Noisy‑neighbor, OOMKills, CPU throttling | Set realistic requests/limits and monitor |
| Probes missing/miswired | Broken rollouts, traffic to dead pods | Wire readiness/liveness/startup endpoints |
| Using NodePort in prod | Fragile exposure | Use Ingress/Gateway or LoadBalancer |
| Baking secrets in image | Leaks, hard to rotate | Use Secrets + env/volume; rotate |
| Stateful DB on emptyDir | Data loss on reschedule | Use PVC/StatefulSet or managed DB |
| Rolling update flaps | Readiness flips too early | Wait for DB/dep; gate on real checks |
| One big namespace | RBAC and blast radius grow | Use namespaces per env/team/service |
| No PDB | Drains kill availability | Add PodDisruptionBudget |
Quick checklist
- [ ] Deployment + Service + Ingress/Gateway in a Namespace.
- [ ] Requests/Limits & Probes set; handle SIGTERM.
- [ ] Config via ConfigMap/Secret; no secrets in images.
- [ ] HPA on CPU/memory; metrics‑server installed.
- [ ] PVC/StatefulSet only when you must run state; backups planned.
- [ ] RBAC, NetworkPolicy, non‑root securityContext.
- [ ] Rollout status/undo wired into CI/CD.
One‑minute adoption plan
- Add requests/limits and probes to every workload; fix readiness first.
- Put config in ConfigMaps/Secrets; swap env vars for hardcoded values.
- Front services with a Service; expose via Ingress/Gateway.
- Enable HPA (CPU 70%) and add a small PDB.
- Lock down with non‑root, RBAC, and a default‑deny NetworkPolicy; document
kubectlrunbooks.