caduh

Kubernetes, Minimal Survival Guide — ship, debug, and sleep at night

5 min read

The 80% you actually need: Pods, Deployments, Services, Ingress/Gateway, Requests/Limits, Probes, Config & Secrets, HPA, storage, and the kubectl commands you’ll use daily.

TL;DR

  • Pods are cattle, not pets. Deploy with Deployments/Jobs, front with a Service, and expose via Ingress/Gateway.
  • Always set resources (requests + limits) and probes (readiness/liveness/startup). That’s your SLO baseline.
  • Keep config out of images: ConfigMap/Secretenv or mounted files. Never bake secrets.
  • Use rolling updates + kubectl rollout to watch and undo quickly.
  • Scale with HPA (needs metrics‑server). Treat state: prefer managed DBs; if you must run storage, use PVC + StatefulSet.
  • Lock it down: Namespaces, RBAC, NetworkPolicies, and non‑root containers.
  • Debug with logs/exec/port‑forward/describe/events; when stuck, use ephemeral containers via kubectl debug.

1) Mental model (what talks to what)

Client → Ingress / Gateway → Service (ClusterIP) → Pods (via labels)
                   └── ConfigMap/Secret mounted into Pods
                   └── HPA scales Deployment by metrics
Persistent data → PVC → StorageClass (provisioner)

Core objects you’ll touch: Namespace, Deployment, Service, Ingress/Gateway, ConfigMap, Secret, HorizontalPodAutoscaler, Job/CronJob, StatefulSet (when needed), PersistentVolumeClaim, NetworkPolicy, PodDisruptionBudget.


2) Minimal “hello web” (Deployment + Service)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels: { app: web }
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate: { maxSurge: 1, maxUnavailable: 0 } # zero‑downtime intent
  selector: { matchLabels: { app: web } }
  template:
    metadata: { labels: { app: web } }
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - name: app
          image: ghcr.io/acme/web:1.2.3
          ports: [ { containerPort: 8080, name: http } ]
          resources:
            requests: { cpu: "100m", memory: "128Mi" }
            limits:   { cpu: "500m", memory: "256Mi" }
          readinessProbe:
            httpGet: { path: /healthz, port: http }
            periodSeconds: 5
          livenessProbe:
            httpGet: { path: /livez, port: http }
            initialDelaySeconds: 20
            periodSeconds: 10
          envFrom:
            - configMapRef: { name: web-config }
            - secretRef: { name: web-secrets }
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1000
---
apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector: { app: web }
  ports: [ { name: http, port: 80, targetPort: http } ]  # ClusterIP default

Ingress (classic)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "1m"
spec:
  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend: { service: { name: web, port: { number: 80 } } }

Newer clusters may support Gateway API as a replacement for some Ingress use cases; the Service still fronts Pods.


3) Config & secrets (don’t bake them in)

apiVersion: v1
kind: ConfigMap
metadata: { name: web-config }
data:
  APP_ENV: "prod"
  FEATURE_X: "true"
---
apiVersion: v1
kind: Secret
metadata: { name: web-secrets }
type: Opaque
stringData:
  DATABASE_URL: "postgres://user:pass@db:5432/app"

Mount as env (shown above) or as files:

volumeMounts:
  - name: cfg
    mountPath: /etc/web
volumes:
  - name: cfg
    configMap: { name: web-config }

4) Scaling (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: web }
spec:
  scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: web }
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Requires metrics-server on the cluster. You can also scale on memory or custom/external metrics.


5) Storage (PVC + StatefulSet when you must)

apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: data-web }
spec:
  accessModes: [ "ReadWriteOnce" ]
  resources: { requests: { storage: 5Gi } }
  storageClassName: standard

For databases/queues, prefer managed services. If you must run stateful workloads, use StatefulSet + PVC and understand backup/restore and node‑failure behavior.


6) Zero‑downtime & safety nets

  • Readiness must flip true only when traffic is safe; liveness restarts stuck Pods.
  • Handle SIGTERM on shutdown; honor terminationGracePeriodSeconds.
  • Control rollout with maxSurge/maxUnavailable; watch with:
    kubectl rollout status deploy/web
    kubectl rollout undo deploy/web
    
  • Prevent voluntary eviction outages:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: web-pdb }
spec:
  minAvailable: 1
  selector: { matchLabels: { app: web } }

7) Networking quick facts

  • Service types: ClusterIP (internal), NodePort (debug), LoadBalancer (cloud LB).
  • DNS: web.default.svc.cluster.local resolves to Service → kube‑proxy → Pod.
  • Port‑forward for local debug:
    kubectl port-forward svc/web 8080:80
    

8) Security essentials (day‑1)

  • Run as non‑root, read‑only FS; drop capabilities (ALL by default, add minimal needed).
  • Pin images (avoid :latest); use imagePullPolicy: IfNotPresent unless debugging.
  • RBAC least privilege; bind specific ServiceAccount to your Deployment. Disable automount of service account tokens when not needed.
  • NetworkPolicy default‑deny + allow necessary egress/ingress.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny }
spec:
  podSelector: {}
  policyTypes: ["Ingress","Egress"]
  • Avoid hostPath volumes; don’t run privileged Pods unless you know why.

9) Everyday kubectl (copy‑paste)

# Contexts & namespaces
kubectl config get-contexts
kubectl config set-context --current --namespace=prod

# Explore
kubectl get pods -owide -l app=web
kubectl describe pod <name>
kubectl get events --sort-by=.lastTimestamp

# Logs & exec
kubectl logs deploy/web -f --all-containers
kubectl exec -it deploy/web -- sh

# Debug
kubectl debug deploy/web -it --image=busybox --target=app   # ephemeral container
kubectl port-forward svc/web 8080:80

# Explain API
kubectl explain deploy.spec.template.spec.containers.resources

Pitfalls & fast fixes

| Pitfall | Why it bites | Fix | |---|---|---| | No requests/limits | Noisy‑neighbor, OOMKills, CPU throttling | Set realistic requests/limits and monitor | | Probes missing/miswired | Broken rollouts, traffic to dead pods | Wire readiness/liveness/startup endpoints | | Using NodePort in prod | Fragile exposure | Use Ingress/Gateway or LoadBalancer | | Baking secrets in image | Leaks, hard to rotate | Use Secrets + env/volume; rotate | | Stateful DB on emptyDir | Data loss on reschedule | Use PVC/StatefulSet or managed DB | | Rolling update flaps | Readiness flips too early | Wait for DB/dep; gate on real checks | | One big namespace | RBAC and blast radius grow | Use namespaces per env/team/service | | No PDB | Drains kill availability | Add PodDisruptionBudget |


Quick checklist

  • [ ] Deployment + Service + Ingress/Gateway in a Namespace.
  • [ ] Requests/Limits & Probes set; handle SIGTERM.
  • [ ] Config via ConfigMap/Secret; no secrets in images.
  • [ ] HPA on CPU/memory; metrics‑server installed.
  • [ ] PVC/StatefulSet only when you must run state; backups planned.
  • [ ] RBAC, NetworkPolicy, non‑root securityContext.
  • [ ] Rollout status/undo wired into CI/CD.

One‑minute adoption plan

  1. Add requests/limits and probes to every workload; fix readiness first.
  2. Put config in ConfigMaps/Secrets; swap env vars for hardcoded values.
  3. Front services with a Service; expose via Ingress/Gateway.
  4. Enable HPA (CPU 70%) and add a small PDB.
  5. Lock down with non‑root, RBAC, and a default‑deny NetworkPolicy; document kubectl runbooks.