Kubernetes, Minimal Survival Guide — ship, debug, and sleep at night

October 4th, 20255 min read#dev #ops #kubernetes #containers #reliability #ca-duh

The 80% you actually need: Pods, Deployments, Services, Ingress/Gateway, Requests/Limits, Probes, Config & Secrets, HPA, storage, and the kubectl commands you’ll use daily.

TL;DR

Pods are cattle, not pets. Deploy with Deployments/Jobs, front with a Service, and expose via Ingress/Gateway.
Always set resources (requests + limits) and probes (readiness/liveness/startup). That’s your SLO baseline.
Keep config out of images: ConfigMap/Secret → env or mounted files. Never bake secrets.
Use rolling updates + kubectl rollout to watch and undo quickly.
Scale with HPA (needs metrics‑server). Treat state: prefer managed DBs; if you must run storage, use PVC + StatefulSet.
Lock it down: Namespaces, RBAC, NetworkPolicies, and non‑root containers.
Debug with logs/exec/port‑forward/describe/events; when stuck, use ephemeral containers via kubectl debug.

1) Mental model (what talks to what)

Client → Ingress / Gateway → Service (ClusterIP) → Pods (via labels)
                   └── ConfigMap/Secret mounted into Pods
                   └── HPA scales Deployment by metrics
Persistent data → PVC → StorageClass (provisioner)

Core objects you’ll touch: Namespace, Deployment, Service, Ingress/Gateway, ConfigMap, Secret, HorizontalPodAutoscaler, Job/CronJob, StatefulSet (when needed), PersistentVolumeClaim, NetworkPolicy, PodDisruptionBudget.

2) Minimal “hello web” (Deployment + Service)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels: { app: web }
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate: { maxSurge: 1, maxUnavailable: 0 } # zero‑downtime intent
  selector: { matchLabels: { app: web } }
  template:
    metadata: { labels: { app: web } }
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - name: app
          image: ghcr.io/acme/web:1.2.3
          ports: [ { containerPort: 8080, name: http } ]
          resources:
            requests: { cpu: "100m", memory: "128Mi" }
            limits:   { cpu: "500m", memory: "256Mi" }
          readinessProbe:
            httpGet: { path: /healthz, port: http }
            periodSeconds: 5
          livenessProbe:
            httpGet: { path: /livez, port: http }
            initialDelaySeconds: 20
            periodSeconds: 10
          envFrom:
            - configMapRef: { name: web-config }
            - secretRef: { name: web-secrets }
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1000
---
apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector: { app: web }
  ports: [ { name: http, port: 80, targetPort: http } ]  # ClusterIP default

Ingress (classic)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "1m"
spec:
  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend: { service: { name: web, port: { number: 80 } } }

Newer clusters may support Gateway API as a replacement for some Ingress use cases; the Service still fronts Pods.

3) Config & secrets (don’t bake them in)

apiVersion: v1
kind: ConfigMap
metadata: { name: web-config }
data:
  APP_ENV: "prod"
  FEATURE_X: "true"
---
apiVersion: v1
kind: Secret
metadata: { name: web-secrets }
type: Opaque
stringData:
  DATABASE_URL: "postgres://user:pass@db:5432/app"

Mount as env (shown above) or as files:

volumeMounts:
  - name: cfg
    mountPath: /etc/web
volumes:
  - name: cfg
    configMap: { name: web-config }

4) Scaling (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: web }
spec:
  scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: web }
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Requires metrics-server on the cluster. You can also scale on memory or custom/external metrics.

5) Storage (PVC + StatefulSet when you must)

apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: data-web }
spec:
  accessModes: [ "ReadWriteOnce" ]
  resources: { requests: { storage: 5Gi } }
  storageClassName: standard

For databases/queues, prefer managed services. If you must run stateful workloads, use StatefulSet + PVC and understand backup/restore and node‑failure behavior.

6) Zero‑downtime & safety nets

Readiness must flip true only when traffic is safe; liveness restarts stuck Pods.
Handle SIGTERM on shutdown; honor terminationGracePeriodSeconds.

Control rollout with maxSurge/maxUnavailable; watch with:

kubectl rollout status deploy/web
kubectl rollout undo deploy/web

Prevent voluntary eviction outages:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: web-pdb }
spec:
  minAvailable: 1
  selector: { matchLabels: { app: web } }

7) Networking quick facts

Service types: ClusterIP (internal), NodePort (debug), LoadBalancer (cloud LB).
DNS: web.default.svc.cluster.local resolves to Service → kube‑proxy → Pod.
Port‑forward for local debug:
```
kubectl port-forward svc/web 8080:80
```

8) Security essentials (day‑1)

Run as non‑root, read‑only FS; drop capabilities (ALL by default, add minimal needed).
Pin images (avoid :latest); use imagePullPolicy: IfNotPresent unless debugging.
RBAC least privilege; bind specific ServiceAccount to your Deployment. Disable automount of service account tokens when not needed.
NetworkPolicy default‑deny + allow necessary egress/ingress.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny }
spec:
  podSelector: {}
  policyTypes: ["Ingress","Egress"]

Avoid hostPath volumes; don’t run privileged Pods unless you know why.

9) Everyday `kubectl` (copy‑paste)

# Contexts & namespaces
kubectl config get-contexts
kubectl config set-context --current --namespace=prod

# Explore
kubectl get pods -owide -l app=web
kubectl describe pod <name>
kubectl get events --sort-by=.lastTimestamp

# Logs & exec
kubectl logs deploy/web -f --all-containers
kubectl exec -it deploy/web -- sh

# Debug
kubectl debug deploy/web -it --image=busybox --target=app   # ephemeral container
kubectl port-forward svc/web 8080:80

# Explain API
kubectl explain deploy.spec.template.spec.containers.resources

Pitfalls & fast fixes

| Pitfall | Why it bites | Fix | |---|---|---| | No requests/limits | Noisy‑neighbor, OOMKills, CPU throttling | Set realistic requests/limits and monitor | | Probes missing/miswired | Broken rollouts, traffic to dead pods | Wire readiness/liveness/startup endpoints | | Using NodePort in prod | Fragile exposure | Use Ingress/Gateway or LoadBalancer | | Baking secrets in image | Leaks, hard to rotate | Use Secrets + env/volume; rotate | | Stateful DB on emptyDir | Data loss on reschedule | Use PVC/StatefulSet or managed DB | | Rolling update flaps | Readiness flips too early | Wait for DB/dep; gate on real checks | | One big namespace | RBAC and blast radius grow | Use namespaces per env/team/service | | No PDB | Drains kill availability | Add PodDisruptionBudget |

Quick checklist

[ ] Deployment + Service + Ingress/Gateway in a Namespace.
[ ] Requests/Limits & Probes set; handle SIGTERM.
[ ] Config via ConfigMap/Secret; no secrets in images.
[ ] HPA on CPU/memory; metrics‑server installed.
[ ] PVC/StatefulSet only when you must run state; backups planned.
[ ] RBAC, NetworkPolicy, non‑root securityContext.
[ ] Rollout status/undo wired into CI/CD.

One‑minute adoption plan

Add requests/limits and probes to every workload; fix readiness first.
Put config in ConfigMaps/Secrets; swap env vars for hardcoded values.
Front services with a Service; expose via Ingress/Gateway.
Enable HPA (CPU 70%) and add a small PDB.
Lock down with non‑root, RBAC, and a default‑deny NetworkPolicy; document kubectl runbooks.