Blue-Green vs Canary vs Rolling Deployments

April 29th, 202612 min readUpdated April 29th, 2026#dev #devops #deployments #reliability #ci-cd #ca-duh

How the three common deployment strategies differ, when to use each one, and what health checks, traffic shifting, rollback plans, and database compatibility need to be in place first.

Blue-Green vs Canary vs Rolling Deployments

How to ship changes without turning every release into a live-fire exercise

Goal: choose the right deployment strategy for the risk you are taking, then make rollback boring.

TL;DR

Rolling deployment replaces instances gradually. It is the default for many stateless services and Kubernetes Deployments.
Blue-green deployment runs two complete production stacks and flips traffic from old to new. It gives fast rollback but costs more capacity.
Canary deployment sends a small slice of traffic to the new version, watches real production signals, then ramps up.
Rolling is simplest. Blue-green is cleanest for cutovers. Canary is safest for risky behavior changes when you have good observability.
All three require backward-compatible database changes, readiness checks, graceful shutdown, and clear rollback criteria.
Feature flags complement deployment strategy. They do not replace a safe deploy or a rollback plan.

1) The quick comparison

| Strategy | Basic idea | Best for | Main tradeoff | |---|---|---|---| | Rolling | Replace old instances with new instances gradually | Routine stateless service releases | Old and new versions run together | | Blue-green | Keep two full environments, switch traffic when green is ready | Clean cutovers and fast rollback | Requires duplicate capacity | | Canary | Send a small percent to new version, then ramp | Risky user-visible or performance-sensitive changes | Needs traffic control and strong metrics |

The real question is not "which one is best?"

The real question is:

How much production traffic should see the new version before you know it is safe?

If the answer is "most traffic is fine," rolling may be enough. If the answer is "none until the whole stack is verified," blue-green fits. If the answer is "a tiny slice first," use canary.

2) Rolling deployments

A rolling deployment gradually replaces old instances with new ones.

v1 v1 v1 v1
v2 v1 v1 v1
v2 v2 v1 v1
v2 v2 v2 v1
v2 v2 v2 v2

This is the standard behavior in many orchestrators because it is simple and capacity-efficient. You do not need a second full environment. You add some new instances, wait for them to become ready, remove some old instances, and continue.

When rolling works well

Use rolling deployments when:

the service is stateless or handles graceful shutdown correctly
the new version is compatible with the old version
requests can land on either version during the rollout
database changes are backward compatible
the blast radius of a bad deploy is acceptable
rollback speed is good enough

Kubernetes example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-api
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 0
  selector:
    matchLabels:
      app: checkout-api
  template:
    metadata:
      labels:
        app: checkout-api
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - name: app
          image: ghcr.io/acme/checkout-api:1.8.4
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /readyz
              port: 8080
          livenessProbe:
            httpGet:
              path: /livez
              port: 8080

maxUnavailable: 0 means the rollout should keep all desired replicas available. maxSurge: 2 allows two extra pods during the update, so the orchestrator has room to start new pods before killing old ones.

Rolling deployment risks

| Risk | Why it happens | Fix | |---|---|---| | Mixed versions break each other | v1 and v2 run at the same time | Keep APIs and schemas backward compatible | | Bad readiness checks | Traffic reaches pods before they are ready | Make readiness check real dependencies | | Slow rollback | Must replace instances again | Keep previous artifact and automate rollback | | In-flight requests fail | Old pods are killed too quickly | Use graceful shutdown and drain hooks | | Background jobs double-run | Old and new workers process same queue differently | Make jobs idempotent and version-safe |

Rolling deployments are good defaults, but they are not magic. They assume mixed-version production is safe.

3) Blue-green deployments

Blue-green deployment keeps two production-like environments:

Blue: current live version
Green: new version being prepared

Traffic points at blue while green is deployed, warmed up, smoke-tested, and verified. When green is ready, the router, load balancer, gateway, or service selector switches traffic to green.

Users -> Load balancer -> Blue v1
                      -> Green v2 (no public traffic yet)

cutover

Users -> Load balancer -> Green v2
                      -> Blue v1 (standby rollback)

When blue-green works well

Use blue-green deployments when:

you need a clean cutover point
you can afford duplicate capacity, at least briefly
you want very fast rollback by switching traffic back
you need to validate the full new stack before users hit it
your routing layer can switch traffic cleanly
the database and external dependencies can support both versions

Blue-green is especially useful for platform changes: new runtime, new container base image, new major dependency, new infrastructure template, or a service migration where you want the whole stack online before cutover.

Avoid DNS-only cutovers when possible

DNS can be part of a blue-green strategy, but it is a blunt rollback tool. Caches, client resolvers, connection reuse, and TTL behavior can make traffic shift slowly or unevenly.

Prefer switching at a layer you control:

load balancer target group
Kubernetes Service selector
ingress or gateway route
service mesh route
edge proxy rule

Use DNS cutovers for coarse traffic steering, not instant rollback promises.

Blue-green risks

| Risk | Why it happens | Fix | |---|---|---| | Shared database breaks rollback | Green writes data blue cannot read | Use expand/contract migrations | | Hidden environment drift | Blue and green are not really identical | Create both from the same IaC and artifact rules | | Cold caches | Green is technically healthy but slow | Warm caches and run smoke tests before cutover | | Long-lived connections | Existing clients stay attached to blue | Drain connections and define cutover windows | | Duplicate scheduled jobs | Both stacks run cron or workers | Ensure only active color runs singleton work |

Blue-green gives clean traffic control, but the data layer still has to be compatible. If green changes the shared database in a way blue cannot tolerate, rollback is not just "switch back."

4) Canary deployments

A canary deployment releases the new version to a small slice of production traffic first.

95% -> v1
 5% -> v2

watch metrics

75% -> v1
25% -> v2

watch metrics

0% -> v1
100% -> v2

The point is not the percentage itself. The point is learning from real production traffic before the change reaches everyone.

Canary slices

You can define a canary by:

percentage of requests
internal users only
one region
one availability zone
one tenant
one plan tier
one route or endpoint
one worker queue

Percentage-based canaries are common, but they are not always best. For B2B systems, tenant-based canaries can be easier to reason about because all requests for one customer consistently hit the same version.

A practical ramp

0%    deploy new version, no public traffic
1%    smoke with real traffic for 10 minutes
5%    watch error rate, latency, saturation, business events
25%   continue if guardrails stay healthy
50%   continue if support/error reports are quiet
100%  complete rollout

The exact schedule should depend on traffic volume. A 1% canary on a tiny service may produce no useful signal. A 1% canary on a high-traffic checkout path may be plenty.

Canary guardrails

Do not run canaries by vibes. Decide the stop conditions before the rollout.

Useful guardrails:

5xx rate
p95 and p99 latency
timeout rate
dependency error rate
CPU and memory saturation
queue lag
payment/auth/conversion success rate
log error volume by version
trace span errors by version

Example:

Stop rollout if:
- v2 5xx rate is 2x v1 for 5 minutes
- v2 p95 latency is more than 25% worse than v1
- payment authorization success drops below baseline
- any new critical alert fires for the canary version

Canary risks

| Risk | Why it happens | Fix | |---|---|---| | Weak signal | Too little traffic reaches the canary | Pick a slice with enough volume | | Bad user stickiness | One user bounces between versions | Route consistently by user, tenant, or session | | No version-tagged metrics | You cannot compare v1 and v2 | Add version labels to metrics and logs | | Shared side effects | Canary writes affect all users | Use backward-compatible data and guarded writes | | Automated ramp too eager | Tool promotes despite weak business signals | Add manual gates for high-risk paths |

Canary deployment is only as good as the observability around it. Without versioned metrics, canary is mostly a slower rolling deployment.

5) Which one should you choose?

| Situation | Good choice | Why | |---|---|---| | Routine stateless API change | Rolling | Simple, efficient, supported everywhere | | Risky endpoint behavior change | Canary | Limits blast radius and compares real traffic | | New runtime or base image | Blue-green or canary | Validate before broad exposure | | Major infra change | Blue-green | Full stack can be tested before cutover | | High-traffic consumer feature | Canary plus feature flag | Gradual exposure with kill switch | | Small internal app | Rolling | Complexity probably is not worth it | | Strict rollback speed requirement | Blue-green | Traffic switch can be near-instant | | Cost-sensitive service | Rolling | No duplicate full environment | | Stateful service | Depends | Deployment strategy must match data model |

If you are unsure, start with rolling deployments plus good readiness checks. Add canary when the business risk or traffic volume justifies it. Use blue-green when a clean cutover and fast traffic rollback matter enough to pay for duplicate capacity.

6) Prerequisites every strategy needs

Safe deployments are less about the label and more about the basics.

Health checks

Use separate checks:

liveness: should the process be restarted?
readiness: should this instance receive traffic?
startup: does this service need extra boot time?

Readiness should verify what is required to serve traffic. It should not be a fake endpoint that returns 200 OK while the app cannot reach its database, config, or critical dependency.

Graceful shutdown

Before terminating an instance:

Stop accepting new work.
Mark the instance unready.
Drain in-flight requests.
Finish or checkpoint background work.
Exit before the grace period ends.

Without graceful shutdown, even a "zero downtime" rollout can drop requests.

Immutable artifacts

Build once, deploy the same artifact everywhere.

Bad:

build for staging
rebuild for prod
hope nothing changed

Better:

build image ghcr.io/acme/api:1.8.4
deploy same image to staging
promote same image to prod

Versioned observability

Tag logs, metrics, and traces with:

service name
version
environment
deployment ID
region or cluster

During a rollout, you should be able to compare old and new versions directly.

7) Database changes decide whether rollback is real

Application rollback is easy only when the old version can still work with the current data.

This breaks rollback:

Deploy v2
v2 renames column full_name -> display_name
v2 writes only display_name
Rollback to v1
v1 expects full_name
production breaks

Use expand/contract instead:

1. Add display_name while keeping full_name
2. Deploy code that reads both and writes both
3. Backfill display_name
4. Flip reads to display_name
5. Later, after confidence window, remove full_name

Rules that help all three strategies:

new code must tolerate old data
old code must tolerate new data during rollback
schema changes should be backward compatible first
destructive changes should happen after a confidence window
background backfills should be resumable and throttled

Deployment strategy cannot save a non-reversible data change.

8) Feature flags are not deployment strategies

Feature flags answer a different question:

Is this behavior enabled?

Deployment strategy answers:

Which version of the code is receiving traffic?

They work best together.

Good combinations:

rolling deploy with a feature disabled, then enable by flag
canary deploy plus flag for one route or tenant
blue-green cutover with risky behavior still disabled
instant flag rollback for behavior, deploy rollback for runtime problems

Flags are excellent for product behavior, permissions, UI changes, and gradual exposure. They are weaker for bad migrations, memory leaks, startup failures, dependency incompatibilities, or broken container images. Those still need deployment rollback.

9) Rollback behavior

| Strategy | Rollback action | Fast? | Watch out for | |---|---|---:|---| | Rolling | Roll back to previous artifact and replace instances | Medium | Takes another rollout | | Blue-green | Switch traffic back to old color | Fast | Only safe if data stayed compatible | | Canary | Stop ramp and send traffic back to stable version | Fast | Side effects may already exist |

Rollback should be rehearsed.

At minimum, know:

what command switches traffic back
who can approve it
what metrics prove recovery
whether database changes are reversible
whether queues or jobs need cleanup
how long old versions are kept available

For high-risk systems, write the rollback command in the release plan before starting the deploy.

10) Background jobs, queues, and cron

Deployments are not only HTTP traffic.

Workers and scheduled jobs often create the hardest rollout problems because they consume shared queues and mutate shared state.

Checklist:

make jobs idempotent
include job schema versions when payloads change
keep workers backward compatible with old messages
pause or drain queues before risky worker deploys
avoid running singleton cron jobs in both blue and green
tag job logs and metrics with app version
keep dead-letter queues visible during rollout

If a web canary writes messages that only v2 workers understand, then the worker rollout is part of the same release. Treat it that way.

11) Common anti-patterns

| Anti-pattern | Why it hurts | Better | |---|---|---| | "We have Kubernetes, so deploys are safe" | Rolling update is not compatibility | Add readiness, rollback, and schema rules | | Health check returns static 200 | Broken pods receive traffic | Check real readiness dependencies | | Rollback plan starts after failure | People improvise under pressure | Write rollback steps before deploy | | Canary without versioned metrics | You cannot tell v1 from v2 | Tag telemetry by version | | Blue-green with shared incompatible DB | Traffic rollback does not recover | Use expand/contract migrations | | DNS as instant rollback | Clients cache and reuse connections | Switch at LB/gateway when possible | | Feature flag used for everything | Bad builds still need deploy rollback | Use flags plus deployment safety | | Killing pods immediately | In-flight requests fail | Drain and handle termination signals |

12) One-page rollout checklist

[ ] Build one immutable artifact and promote it through environments.
[ ] Confirm the new version is backward compatible with old data and old messages.
[ ] Confirm the old version can run after rollback.
[ ] Add or verify readiness, liveness, and startup checks.
[ ] Set graceful shutdown and connection draining.
[ ] Tag logs, metrics, and traces with service version and deployment ID.
[ ] Define success metrics and stop conditions.
[ ] Pick the rollout strategy: rolling, blue-green, or canary.
[ ] Write the rollback command before starting.
[ ] Keep old artifacts and old infrastructure available until the confidence window passes.
[ ] Watch error rate, latency, saturation, and business metrics during rollout.
[ ] Clean up old versions, flags, and temporary compatibility code after the release is proven.

The practical rule

Use the simplest strategy that gives you the safety you need:

choose rolling for routine compatible changes
choose blue-green when you need a clean cutover and fast traffic rollback
choose canary when real-user risk needs gradual exposure and strong guardrails

The strategy is only the visible part. The actual safety comes from compatibility, health checks, observability, and rollback discipline.