Skip to main content

How to implement blue-green deployment with Docker?

Blue-green deployment is a deploy strategy where two identical environments coexist briefly during release. The old (blue) serves traffic while the new (green) warms up; once green is healthy, the load balancer flips, and blue becomes the rollback target. With Docker, this is straightforward to implement.

Theory

TL;DR

  • Two complete environments: blue (current) and green (new). Both fully running, only one serves traffic.
  • A reverse proxy / load balancer routes traffic to whichever is "live".
  • Cutover is atomic — flip the routing config and the entire system switches to the new version.
  • Rollback is atomic — flip the routing back if the new version misbehaves.
  • Cost: 2x resources during the deploy window.
  • Best for stateless apps. State (DBs) needs special care because both versions might briefly run against the same data.

Visual flow

Before deploy: During deploy (both running): Traffic Traffic | | v v +---------+ +---------+ | router | | blue | ← traffic +---------+ | v1.0 | | | +---------+ v v (only blue receives until cutover) +-----+ +------+ |blue | |green | |v1.0 | |v1.1 | (warming up, healthcheck running) +-----+ +------+ After cutover: After drain & cleanup: Traffic | Traffic v | +---------+ v | router | ──→ green +---------+ +---------+ | green | ← traffic | | | v1.1 | v v +---------+ +-----+ +------+ (blue removed) |blue | |green | |v1.0 | |v1.1 | ← traffic +-----+ +------+

Implementation with Docker + reverse proxy

Setup

bash
# A shared network for proxy and apps docker network create proxy

Step 1: blue is running

bash
docker run -d --name api-blue \ --network proxy \ --restart unless-stopped \ myorg/api:1.0

Reverse proxy (Traefik, nginx, Caddy) routes traffic to api-blue.

# nginx upstream upstream api_backend { server api-blue:3000; }

Step 2: bring up green

bash
docker run -d --name api-green \ --network proxy \ --restart unless-stopped \ --health-cmd='curl -f http://localhost:3000/health' \ --health-interval=5s \ myorg/api:1.1

Green is up but no traffic. Wait for healthcheck:

bash
docker inspect api-green --format '{{.State.Health.Status}}' # wait until: healthy

Step 3: smoke-test green out-of-band

Before flipping traffic, verify green works directly:

bash
docker run --rm --network proxy curlimages/curl \ curl -f http://api-green:3000/health docker run --rm --network proxy curlimages/curl \ curl -f http://api-green:3000/api/v1/test

If green misbehaves, you have not yet impacted production. Fix or abandon green.

Step 4: cutover

Update the reverse proxy to route to green:

upstream api_backend { server api-green:3000; }

Reload nginx (nginx -s reload) or trigger Traefik to update via labels. The cutover is near-instant; any in-flight requests on blue continue (graceful), new requests go to green.

Step 5: monitor

Watch metrics, error rates, app logs. If trouble:

upstream api_backend { server api-blue:3000; # ← rollback }

Reload. Traffic flows to blue again. Total rollback time: seconds.

Step 6: drain & cleanup

If green is good after monitoring window:

bash
docker stop api-blue && docker rm api-blue

Rename for next deploy:

bash
docker rename api-green api-blue

Or, more commonly, the next release becomes the new "green" and the cycle continues.

With Traefik (auto-routing via labels)

yaml
# compose.yaml — initial state services: traefik: image: traefik:v3 command: - --providers.docker - --entrypoints.web.address=:80 ports: ["80:80"] volumes: ["/var/run/docker.sock:/var/run/docker.sock:ro"] api-blue: image: myorg/api:1.0 labels: - "traefik.enable=true" - "traefik.http.routers.api.rule=Host(`api.example.com`)" - "traefik.http.services.api.loadbalancer.server.port=3000"

When deploying green:

bash
# Bring up green WITHOUT the traefik labels (no traffic yet) docker run -d --name api-green myorg/api:1.1 # Or with labels but on a different host rule # After green is healthy, swap labels: # Remove labels from blue # Add labels to green # Traefik picks up the change in seconds

Label-driven Traefik makes the cutover declarative.

State management — the hard part

Blue-green is easy for stateless services. State adds complexity:

Database schema:

  • During the cutover window, both blue and green might query the DB.
  • The schema must be backward compatible with both versions.
  • Pattern: expand-then-contract.
    1. Deploy expand migration (new column, new table). Old code still works.
    2. Deploy green code that uses the new schema. Both versions coexist briefly during cutover.
    3. After cutover and confirmation, deploy contract migration (drop old column).
  • Never breaking-change schema during a blue-green deploy.

Sessions:

  • If sessions are in-memory, blue's sessions disappear on cutover.
  • Use externalized sessions (Redis, JWTs, signed cookies). Then both versions can serve any session.

File uploads:

  • Same volume mounted into blue and green; both can read/write.
  • Or use object storage (S3) — both versions point at the same bucket.

Caches:

  • New version's deserializers must understand old version's cache entries (or namespace by version).
  • Or invalidate the cache as part of cutover (causing brief slowness, not breakage).

Traffic shifting variants

Blue-green is binary: 100% blue or 100% green. Variants:

  • Canary: route a small % (5%) to green; if metrics good, raise to 50%, 100%. More gradual; lets you catch slow regressions.
  • A/B testing: route by user attribute (cookie, header) instead of percentage. Useful for feature comparison.
  • Rolling: replace tasks one at a time (Swarm/K8s default). Less resource cost, less atomic.

Blue-green is the simplest and most atomic; canary catches more issues; rolling is cheapest. Most teams use a combination.

Common mistakes

Forgetting the rollback test

Weeks pass; nobody actually verified that flipping the routing back works. Test it: deploy a no-op green, flip, flip back, confirm.

Schema-breaking changes during cutover

sql
-- Migration that runs during cutover ALTER TABLE users DROP COLUMN old_field;

Old blue still queries old_field until traffic flips. Result: errors during the brief overlap. Fix: expand-then-contract pattern.

State that does not survive cutover

In-memory caches, in-memory sessions, in-flight WebSockets. Plan how each survives. Sessions externalize; WebSockets get drained gracefully (connection migration is a hard problem).

Insufficient monitoring during cutover

If you flip and look away, you do not know whether green is healthy under real load. Watch error rates, latency, throughput in real time during the first 5-10 minutes.

No automated cutover

Manual reverse-proxy edits are error-prone. Use traefik labels, consul-template, or a CI script that does the routing change atomically.

Real-world usage

  • Stateless web/API services: ideal use case. Most teams that do blue-green do it here.
  • Single-host Compose deploys: swap via Traefik labels or nginx upstream.
  • Swarm-based deploys: combine blue-green with Swarm services labeled by color, plus Traefik or HAProxy routing.
  • Kubernetes: Service selector swap; or use specialized tooling (Argo Rollouts, Flagger) for blue-green and canary.

Follow-up questions

Q: How long should I wait between bringing up green and cutting over?


A: Long enough for healthcheck + smoke-test + warmup. Apps with cold caches may need minutes of synthetic traffic before they perform at production levels. "Healthy" is necessary but not sufficient.

Q: What about long-running connections (WebSockets, gRPC streams)?


A: They live on blue until reconnect. Newly-opened connections go to green. Most apps drain naturally over minutes. For critical persistent connections, plan a maintenance window or implement reconnect logic.

Q: Is blue-green better than canary?


A: Different goals. Blue-green: atomic cutover, easier to reason about, instant rollback. Canary: gradual exposure, catches slow-burn regressions, more nuanced. Most mature teams do canary for big releases, blue-green for fast iterations.

Q: How do I do blue-green with database schema changes?


A: Expand-then-contract. (1) Deploy schema migration that adds new structure without removing old (expand). (2) Deploy app code (green) that uses both old and new. (3) After green is stable, deploy schema migration that removes old (contract). Three releases for one logical change, but each is safe.

Q: (Senior) How do you handle a partial blue-green where the database is split across two deploys?


A: Use feature flags + careful schema management. The code path that uses the new schema is gated by a flag; both blue and green have the new code, but only green has the flag enabled. Deploy schema migration first (expand). Deploy both blue and green with the new code, flag off. Cutover to green. Enable flag (perhaps gradually via percentage). If issues, disable flag (rollback without re-deploy). Eventually, remove old schema and code paths. Decouples deploy from rollout.

Examples

Single-host with nginx

bash
# State: api-blue running, nginx routing to it # Bring up green docker run -d --name api-green \ --network proxy \ --restart unless-stopped \ --health-cmd='wget -q --spider http://localhost:3000/health' \ myorg/api:1.1 # Wait for healthy while [[ "$(docker inspect api-green --format '{{.State.Health.Status}}')" != "healthy" ]]; do sleep 2 done # Smoke test docker run --rm --network proxy curlimages/curl curl -f http://api-green:3000/health # Update nginx config sed -i 's/api-blue:3000/api-green:3000/' /etc/nginx/conf.d/default.conf nginx -t && nginx -s reload # Monitor for 10 minutes sleep 600 # If error rate is normal, cleanup blue docker stop api-blue && docker rm api-blue docker rename api-green api-blue

Traefik label-swap pattern

yaml
# compose-blue.yaml — currently active services: api: image: myorg/api:1.0 labels: - "traefik.enable=true" - "traefik.http.routers.api.rule=Host(`api.example.com`)" - "traefik.http.services.api.loadbalancer.server.port=3000" container_name: api-blue
bash
# Bring up green WITHOUT traefik labels docker run -d --name api-green --network proxy myorg/api:1.1 # (no labels yet, so Traefik does not route traffic to it) # Smoke-test green docker run --rm --network proxy curlimages/curl curl -f http://api-green:3000/health # Cutover: remove blue labels, add green labels docker container update --label-add 'traefik.enable=false' api-blue docker container update --label-add 'traefik.enable=true' api-green docker container update --label-add 'traefik.http.routers.api.rule=Host(`api.example.com`)' api-green docker container update --label-add 'traefik.http.services.api.loadbalancer.server.port=3000' api-green # Traefik picks up the change within seconds.

Note: docker container update for labels works only on Swarm-mode services; for plain docker run, you might recreate the container with updated labels. Many teams use Swarm services or K8s for this reason.

Rollback playbook

bash
#!/bin/bash # rollback.sh — execute when post-cutover monitoring shows trouble set -e # Restore blue's labels (or revert nginx config) sed -i 's/api-green:3000/api-blue:3000/' /etc/nginx/conf.d/default.conf nginx -t && nginx -s reload # Optionally, stop green (or leave running for forensics) # docker stop api-green echo "Rolled back to blue. Investigate green container: docker logs api-green"

Fast rollback. Prepared in advance, runs in seconds. If you cannot run this from muscle memory, you do not really have blue-green.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Comments

No comments yet