Blue-Green vs Canary vs Rolling Deployments
How to ship changes without turning every release into a live-fire exercise
Goal: choose the right deployment strategy for the risk you are taking, then make rollback boring.
TL;DR
- Rolling deployment replaces instances gradually. It is the default for many stateless services and Kubernetes Deployments.
- Blue-green deployment runs two complete production stacks and flips traffic from old to new. It gives fast rollback but costs more capacity.
- Canary deployment sends a small slice of traffic to the new version, watches real production signals, then ramps up.
- Rolling is simplest. Blue-green is cleanest for cutovers. Canary is safest for risky behavior changes when you have good observability.
- All three require backward-compatible database changes, readiness checks, graceful shutdown, and clear rollback criteria.
- Feature flags complement deployment strategy. They do not replace a safe deploy or a rollback plan.
1) The quick comparison
| Strategy | Basic idea | Best for | Main tradeoff | |---|---|---|---| | Rolling | Replace old instances with new instances gradually | Routine stateless service releases | Old and new versions run together | | Blue-green | Keep two full environments, switch traffic when green is ready | Clean cutovers and fast rollback | Requires duplicate capacity | | Canary | Send a small percent to new version, then ramp | Risky user-visible or performance-sensitive changes | Needs traffic control and strong metrics |
The real question is not "which one is best?"
The real question is:
How much production traffic should see the new version before you know it is safe?
If the answer is "most traffic is fine," rolling may be enough. If the answer is "none until the whole stack is verified," blue-green fits. If the answer is "a tiny slice first," use canary.
2) Rolling deployments
A rolling deployment gradually replaces old instances with new ones.
v1 v1 v1 v1
v2 v1 v1 v1
v2 v2 v1 v1
v2 v2 v2 v1
v2 v2 v2 v2
This is the standard behavior in many orchestrators because it is simple and capacity-efficient. You do not need a second full environment. You add some new instances, wait for them to become ready, remove some old instances, and continue.
When rolling works well
Use rolling deployments when:
- the service is stateless or handles graceful shutdown correctly
- the new version is compatible with the old version
- requests can land on either version during the rollout
- database changes are backward compatible
- the blast radius of a bad deploy is acceptable
- rollback speed is good enough
Kubernetes example
apiVersion: apps/v1
kind: Deployment
metadata:
name: checkout-api
spec:
replicas: 6
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
selector:
matchLabels:
app: checkout-api
template:
metadata:
labels:
app: checkout-api
spec:
terminationGracePeriodSeconds: 30
containers:
- name: app
image: ghcr.io/acme/checkout-api:1.8.4
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /readyz
port: 8080
livenessProbe:
httpGet:
path: /livez
port: 8080
maxUnavailable: 0 means the rollout should keep all desired replicas available. maxSurge: 2 allows two extra pods during the update, so the orchestrator has room to start new pods before killing old ones.
Rolling deployment risks
| Risk | Why it happens | Fix | |---|---|---| | Mixed versions break each other | v1 and v2 run at the same time | Keep APIs and schemas backward compatible | | Bad readiness checks | Traffic reaches pods before they are ready | Make readiness check real dependencies | | Slow rollback | Must replace instances again | Keep previous artifact and automate rollback | | In-flight requests fail | Old pods are killed too quickly | Use graceful shutdown and drain hooks | | Background jobs double-run | Old and new workers process same queue differently | Make jobs idempotent and version-safe |
Rolling deployments are good defaults, but they are not magic. They assume mixed-version production is safe.
3) Blue-green deployments
Blue-green deployment keeps two production-like environments:
- Blue: current live version
- Green: new version being prepared
Traffic points at blue while green is deployed, warmed up, smoke-tested, and verified. When green is ready, the router, load balancer, gateway, or service selector switches traffic to green.
Users -> Load balancer -> Blue v1
-> Green v2 (no public traffic yet)
cutover
Users -> Load balancer -> Green v2
-> Blue v1 (standby rollback)
When blue-green works well
Use blue-green deployments when:
- you need a clean cutover point
- you can afford duplicate capacity, at least briefly
- you want very fast rollback by switching traffic back
- you need to validate the full new stack before users hit it
- your routing layer can switch traffic cleanly
- the database and external dependencies can support both versions
Blue-green is especially useful for platform changes: new runtime, new container base image, new major dependency, new infrastructure template, or a service migration where you want the whole stack online before cutover.
Avoid DNS-only cutovers when possible
DNS can be part of a blue-green strategy, but it is a blunt rollback tool. Caches, client resolvers, connection reuse, and TTL behavior can make traffic shift slowly or unevenly.
Prefer switching at a layer you control:
- load balancer target group
- Kubernetes Service selector
- ingress or gateway route
- service mesh route
- edge proxy rule
Use DNS cutovers for coarse traffic steering, not instant rollback promises.
Blue-green risks
| Risk | Why it happens | Fix | |---|---|---| | Shared database breaks rollback | Green writes data blue cannot read | Use expand/contract migrations | | Hidden environment drift | Blue and green are not really identical | Create both from the same IaC and artifact rules | | Cold caches | Green is technically healthy but slow | Warm caches and run smoke tests before cutover | | Long-lived connections | Existing clients stay attached to blue | Drain connections and define cutover windows | | Duplicate scheduled jobs | Both stacks run cron or workers | Ensure only active color runs singleton work |
Blue-green gives clean traffic control, but the data layer still has to be compatible. If green changes the shared database in a way blue cannot tolerate, rollback is not just "switch back."
4) Canary deployments
A canary deployment releases the new version to a small slice of production traffic first.
95% -> v1
5% -> v2
watch metrics
75% -> v1
25% -> v2
watch metrics
0% -> v1
100% -> v2
The point is not the percentage itself. The point is learning from real production traffic before the change reaches everyone.
Canary slices
You can define a canary by:
- percentage of requests
- internal users only
- one region
- one availability zone
- one tenant
- one plan tier
- one route or endpoint
- one worker queue
Percentage-based canaries are common, but they are not always best. For B2B systems, tenant-based canaries can be easier to reason about because all requests for one customer consistently hit the same version.
A practical ramp
0% deploy new version, no public traffic
1% smoke with real traffic for 10 minutes
5% watch error rate, latency, saturation, business events
25% continue if guardrails stay healthy
50% continue if support/error reports are quiet
100% complete rollout
The exact schedule should depend on traffic volume. A 1% canary on a tiny service may produce no useful signal. A 1% canary on a high-traffic checkout path may be plenty.
Canary guardrails
Do not run canaries by vibes. Decide the stop conditions before the rollout.
Useful guardrails:
- 5xx rate
- p95 and p99 latency
- timeout rate
- dependency error rate
- CPU and memory saturation
- queue lag
- payment/auth/conversion success rate
- log error volume by version
- trace span errors by version
Example:
Stop rollout if:
- v2 5xx rate is 2x v1 for 5 minutes
- v2 p95 latency is more than 25% worse than v1
- payment authorization success drops below baseline
- any new critical alert fires for the canary version
Canary risks
| Risk | Why it happens | Fix |
|---|---|---|
| Weak signal | Too little traffic reaches the canary | Pick a slice with enough volume |
| Bad user stickiness | One user bounces between versions | Route consistently by user, tenant, or session |
| No version-tagged metrics | You cannot compare v1 and v2 | Add version labels to metrics and logs |
| Shared side effects | Canary writes affect all users | Use backward-compatible data and guarded writes |
| Automated ramp too eager | Tool promotes despite weak business signals | Add manual gates for high-risk paths |
Canary deployment is only as good as the observability around it. Without versioned metrics, canary is mostly a slower rolling deployment.
5) Which one should you choose?
| Situation | Good choice | Why | |---|---|---| | Routine stateless API change | Rolling | Simple, efficient, supported everywhere | | Risky endpoint behavior change | Canary | Limits blast radius and compares real traffic | | New runtime or base image | Blue-green or canary | Validate before broad exposure | | Major infra change | Blue-green | Full stack can be tested before cutover | | High-traffic consumer feature | Canary plus feature flag | Gradual exposure with kill switch | | Small internal app | Rolling | Complexity probably is not worth it | | Strict rollback speed requirement | Blue-green | Traffic switch can be near-instant | | Cost-sensitive service | Rolling | No duplicate full environment | | Stateful service | Depends | Deployment strategy must match data model |
If you are unsure, start with rolling deployments plus good readiness checks. Add canary when the business risk or traffic volume justifies it. Use blue-green when a clean cutover and fast traffic rollback matter enough to pay for duplicate capacity.
6) Prerequisites every strategy needs
Safe deployments are less about the label and more about the basics.
Health checks
Use separate checks:
- liveness: should the process be restarted?
- readiness: should this instance receive traffic?
- startup: does this service need extra boot time?
Readiness should verify what is required to serve traffic. It should not be a fake endpoint that returns 200 OK while the app cannot reach its database, config, or critical dependency.
Graceful shutdown
Before terminating an instance:
- Stop accepting new work.
- Mark the instance unready.
- Drain in-flight requests.
- Finish or checkpoint background work.
- Exit before the grace period ends.
Without graceful shutdown, even a "zero downtime" rollout can drop requests.
Immutable artifacts
Build once, deploy the same artifact everywhere.
Bad:
build for staging
rebuild for prod
hope nothing changed
Better:
build image ghcr.io/acme/api:1.8.4
deploy same image to staging
promote same image to prod
Versioned observability
Tag logs, metrics, and traces with:
- service name
- version
- environment
- deployment ID
- region or cluster
During a rollout, you should be able to compare old and new versions directly.
7) Database changes decide whether rollback is real
Application rollback is easy only when the old version can still work with the current data.
This breaks rollback:
Deploy v2
v2 renames column full_name -> display_name
v2 writes only display_name
Rollback to v1
v1 expects full_name
production breaks
Use expand/contract instead:
1. Add display_name while keeping full_name
2. Deploy code that reads both and writes both
3. Backfill display_name
4. Flip reads to display_name
5. Later, after confidence window, remove full_name
Rules that help all three strategies:
- new code must tolerate old data
- old code must tolerate new data during rollback
- schema changes should be backward compatible first
- destructive changes should happen after a confidence window
- background backfills should be resumable and throttled
Deployment strategy cannot save a non-reversible data change.
8) Feature flags are not deployment strategies
Feature flags answer a different question:
Is this behavior enabled?
Deployment strategy answers:
Which version of the code is receiving traffic?
They work best together.
Good combinations:
- rolling deploy with a feature disabled, then enable by flag
- canary deploy plus flag for one route or tenant
- blue-green cutover with risky behavior still disabled
- instant flag rollback for behavior, deploy rollback for runtime problems
Flags are excellent for product behavior, permissions, UI changes, and gradual exposure. They are weaker for bad migrations, memory leaks, startup failures, dependency incompatibilities, or broken container images. Those still need deployment rollback.
9) Rollback behavior
| Strategy | Rollback action | Fast? | Watch out for | |---|---|---:|---| | Rolling | Roll back to previous artifact and replace instances | Medium | Takes another rollout | | Blue-green | Switch traffic back to old color | Fast | Only safe if data stayed compatible | | Canary | Stop ramp and send traffic back to stable version | Fast | Side effects may already exist |
Rollback should be rehearsed.
At minimum, know:
- what command switches traffic back
- who can approve it
- what metrics prove recovery
- whether database changes are reversible
- whether queues or jobs need cleanup
- how long old versions are kept available
For high-risk systems, write the rollback command in the release plan before starting the deploy.
10) Background jobs, queues, and cron
Deployments are not only HTTP traffic.
Workers and scheduled jobs often create the hardest rollout problems because they consume shared queues and mutate shared state.
Checklist:
- make jobs idempotent
- include job schema versions when payloads change
- keep workers backward compatible with old messages
- pause or drain queues before risky worker deploys
- avoid running singleton cron jobs in both blue and green
- tag job logs and metrics with app version
- keep dead-letter queues visible during rollout
If a web canary writes messages that only v2 workers understand, then the worker rollout is part of the same release. Treat it that way.
11) Common anti-patterns
| Anti-pattern | Why it hurts | Better |
|---|---|---|
| "We have Kubernetes, so deploys are safe" | Rolling update is not compatibility | Add readiness, rollback, and schema rules |
| Health check returns static 200 | Broken pods receive traffic | Check real readiness dependencies |
| Rollback plan starts after failure | People improvise under pressure | Write rollback steps before deploy |
| Canary without versioned metrics | You cannot tell v1 from v2 | Tag telemetry by version |
| Blue-green with shared incompatible DB | Traffic rollback does not recover | Use expand/contract migrations |
| DNS as instant rollback | Clients cache and reuse connections | Switch at LB/gateway when possible |
| Feature flag used for everything | Bad builds still need deploy rollback | Use flags plus deployment safety |
| Killing pods immediately | In-flight requests fail | Drain and handle termination signals |
12) One-page rollout checklist
- [ ] Build one immutable artifact and promote it through environments.
- [ ] Confirm the new version is backward compatible with old data and old messages.
- [ ] Confirm the old version can run after rollback.
- [ ] Add or verify readiness, liveness, and startup checks.
- [ ] Set graceful shutdown and connection draining.
- [ ] Tag logs, metrics, and traces with service version and deployment ID.
- [ ] Define success metrics and stop conditions.
- [ ] Pick the rollout strategy: rolling, blue-green, or canary.
- [ ] Write the rollback command before starting.
- [ ] Keep old artifacts and old infrastructure available until the confidence window passes.
- [ ] Watch error rate, latency, saturation, and business metrics during rollout.
- [ ] Clean up old versions, flags, and temporary compatibility code after the release is proven.
The practical rule
Use the simplest strategy that gives you the safety you need:
- choose rolling for routine compatible changes
- choose blue-green when you need a clean cutover and fast traffic rollback
- choose canary when real-user risk needs gradual exposure and strong guardrails
The strategy is only the visible part. The actual safety comes from compatibility, health checks, observability, and rollback discipline.