Suggest an edit

Improve this article

Refine the answer for “How to implement blue-green deployment with Docker?”. Your changes go to moderation before they’re published.

Approval required

Content

What you’re changing

Title (EN)

Short answer (EN)

Shown above the full answer for quick recall.

Answer (EN)

**Blue-green deployment** is a deploy strategy where two identical environments coexist briefly during release. The old (blue) serves traffic while the new (green) warms up; once green is healthy, the load balancer flips, and blue becomes the rollback target. With Docker, this is straightforward to implement.

## Theory

### TL;DR

- Two complete environments: **blue** (current) and **green** (new). Both fully running, only one serves traffic.
- A reverse proxy / load balancer routes traffic to whichever is "live".
- **Cutover is atomic** — flip the routing config and the entire system switches to the new version.
- **Rollback is atomic** — flip the routing back if the new version misbehaves.
- **Cost:** 2x resources during the deploy window.
- **Best for stateless apps.** State (DBs) needs special care because both versions might briefly run against the same data.

### Visual flow

### Implementation with Docker + reverse proxy

#### Setup

```bash
# A shared network for proxy and apps
docker network create proxy
```

#### Step 1: blue is running

```bash
docker run -d --name api-blue \
    --network proxy \
    --restart unless-stopped \
    myorg/api:1.0
```

Reverse proxy (Traefik, nginx, Caddy) routes traffic to `api-blue`.

```
# nginx upstream
upstream api_backend {
    server api-blue:3000;
}
```

#### Step 2: bring up green

```bash
docker run -d --name api-green \
    --network proxy \
    --restart unless-stopped \
    --health-cmd='curl -f http://localhost:3000/health' \
    --health-interval=5s \
    myorg/api:1.1
```

Green is up but no traffic. Wait for healthcheck:

```bash
docker inspect api-green --format '{{.State.Health.Status}}'
# wait until: healthy
```

#### Step 3: smoke-test green out-of-band

Before flipping traffic, verify green works directly:

```bash
docker run --rm --network proxy curlimages/curl \
    curl -f http://api-green:3000/health
docker run --rm --network proxy curlimages/curl \
    curl -f http://api-green:3000/api/v1/test
```

If green misbehaves, you have not yet impacted production. Fix or abandon green.

#### Step 4: cutover

Update the reverse proxy to route to green:

```
upstream api_backend {
    server api-green:3000;
}
```

Reload nginx (`nginx -s reload`) or trigger Traefik to update via labels. The cutover is near-instant; any in-flight requests on blue continue (graceful), new requests go to green.

#### Step 5: monitor

Watch metrics, error rates, app logs. If trouble:

```
upstream api_backend {
    server api-blue:3000;   # ← rollback
}
```

Reload. Traffic flows to blue again. Total rollback time: seconds.

#### Step 6: drain & cleanup

If green is good after monitoring window:

```bash
docker stop api-blue && docker rm api-blue
```

Rename for next deploy:

```bash
docker rename api-green api-blue
```

Or, more commonly, the next release becomes the new "green" and the cycle continues.

### With Traefik (auto-routing via labels)

```yaml
# compose.yaml — initial state
services:
  traefik:
    image: traefik:v3
    command:
      - --providers.docker
      - --entrypoints.web.address=:80
    ports: ["80:80"]
    volumes: ["/var/run/docker.sock:/var/run/docker.sock:ro"]

api-blue:
    image: myorg/api:1.0
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.api.rule=Host(`api.example.com`)"
      - "traefik.http.services.api.loadbalancer.server.port=3000"
```

When deploying green:

```bash
# Bring up green WITHOUT the traefik labels (no traffic yet)
docker run -d --name api-green myorg/api:1.1
# Or with labels but on a different host rule

# After green is healthy, swap labels:
# Remove labels from blue
# Add labels to green
# Traefik picks up the change in seconds
```

Label-driven Traefik makes the cutover declarative.

### State management — the hard part

Blue-green is easy for stateless services. State adds complexity:

**Database schema:**
- During the cutover window, both blue and green might query the DB.
- The schema must be **backward compatible** with both versions.
- Pattern: **expand-then-contract**.
  1. Deploy expand migration (new column, new table). Old code still works.
  2. Deploy green code that uses the new schema. Both versions coexist briefly during cutover.
  3. After cutover and confirmation, deploy contract migration (drop old column).
- Never breaking-change schema during a blue-green deploy.

**Sessions:**
- If sessions are in-memory, blue's sessions disappear on cutover.
- Use externalized sessions (Redis, JWTs, signed cookies). Then both versions can serve any session.

**File uploads:**
- Same volume mounted into blue and green; both can read/write.
- Or use object storage (S3) — both versions point at the same bucket.

**Caches:**
- New version's deserializers must understand old version's cache entries (or namespace by version).
- Or invalidate the cache as part of cutover (causing brief slowness, not breakage).

### Traffic shifting variants

Blue-green is binary: 100% blue or 100% green. Variants:

- **Canary:** route a small % (5%) to green; if metrics good, raise to 50%, 100%. More gradual; lets you catch slow regressions.
- **A/B testing:** route by user attribute (cookie, header) instead of percentage. Useful for feature comparison.
- **Rolling:** replace tasks one at a time (Swarm/K8s default). Less resource cost, less atomic.

Blue-green is the simplest and most atomic; canary catches more issues; rolling is cheapest. Most teams use a combination.

### Common mistakes

**Forgetting the rollback test**

Weeks pass; nobody actually verified that flipping the routing back works. Test it: deploy a no-op green, flip, flip back, confirm.

**Schema-breaking changes during cutover**

```sql
-- Migration that runs during cutover
ALTER TABLE users DROP COLUMN old_field;
```

Old blue still queries `old_field` until traffic flips. Result: errors during the brief overlap. Fix: expand-then-contract pattern.

**State that does not survive cutover**

In-memory caches, in-memory sessions, in-flight WebSockets. Plan how each survives. Sessions externalize; WebSockets get drained gracefully (connection migration is a hard problem).

**Insufficient monitoring during cutover**

If you flip and look away, you do not know whether green is healthy under real load. Watch error rates, latency, throughput in real time during the first 5-10 minutes.

**No automated cutover**

Manual reverse-proxy edits are error-prone. Use `traefik` labels, `consul-template`, or a CI script that does the routing change atomically.

### Real-world usage

- **Stateless web/API services:** ideal use case. Most teams that do blue-green do it here.
- **Single-host Compose deploys:** swap via Traefik labels or nginx upstream.
- **Swarm-based deploys:** combine blue-green with Swarm services labeled by color, plus Traefik or HAProxy routing.
- **Kubernetes:** Service `selector` swap; or use specialized tooling (Argo Rollouts, Flagger) for blue-green and canary.

### Follow-up questions

**Q:** How long should I wait between bringing up green and cutting over?

**A:** Long enough for healthcheck + smoke-test + warmup. Apps with cold caches may need minutes of synthetic traffic before they perform at production levels. "Healthy" is necessary but not sufficient.

**Q:** What about long-running connections (WebSockets, gRPC streams)?

**A:** They live on blue until reconnect. Newly-opened connections go to green. Most apps drain naturally over minutes. For critical persistent connections, plan a maintenance window or implement reconnect logic.

**Q:** Is blue-green better than canary?

**A:** Different goals. Blue-green: atomic cutover, easier to reason about, instant rollback. Canary: gradual exposure, catches slow-burn regressions, more nuanced. Most mature teams do canary for big releases, blue-green for fast iterations.

**Q:** How do I do blue-green with database schema changes?

**A:** Expand-then-contract. (1) Deploy schema migration that adds new structure without removing old (expand). (2) Deploy app code (green) that uses both old and new. (3) After green is stable, deploy schema migration that removes old (contract). Three releases for one logical change, but each is safe.

**Q:** (Senior) How do you handle a partial blue-green where the database is split across two deploys?

**A:** Use feature flags + careful schema management. The code path that uses the new schema is gated by a flag; both blue and green have the new code, but only green has the flag enabled. Deploy schema migration first (expand). Deploy both blue and green with the new code, flag off. Cutover to green. Enable flag (perhaps gradually via percentage). If issues, disable flag (rollback without re-deploy). Eventually, remove old schema and code paths. Decouples deploy from rollout.

## Examples

### Single-host with nginx

```bash
# State: api-blue running, nginx routing to it

# Bring up green
docker run -d --name api-green \
    --network proxy \
    --restart unless-stopped \
    --health-cmd='wget -q --spider http://localhost:3000/health' \
    myorg/api:1.1

# Wait for healthy
while [[ "$(docker inspect api-green --format '{{.State.Health.Status}}')" != "healthy" ]]; do
    sleep 2
done

# Smoke test
docker run --rm --network proxy curlimages/curl curl -f http://api-green:3000/health

# Update nginx config
sed -i 's/api-blue:3000/api-green:3000/' /etc/nginx/conf.d/default.conf
nginx -t && nginx -s reload

# Monitor for 10 minutes
sleep 600

# If error rate is normal, cleanup blue
docker stop api-blue && docker rm api-blue
docker rename api-green api-blue
```

### Traefik label-swap pattern

```yaml
# compose-blue.yaml — currently active
services:
  api:
    image: myorg/api:1.0
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.api.rule=Host(`api.example.com`)"
      - "traefik.http.services.api.loadbalancer.server.port=3000"
    container_name: api-blue
```

```bash
# Bring up green WITHOUT traefik labels
docker run -d --name api-green --network proxy myorg/api:1.1
# (no labels yet, so Traefik does not route traffic to it)

# Smoke-test green
docker run --rm --network proxy curlimages/curl curl -f http://api-green:3000/health

# Cutover: remove blue labels, add green labels
docker container update --label-add 'traefik.enable=false' api-blue
docker container update --label-add 'traefik.enable=true' api-green
docker container update --label-add 'traefik.http.routers.api.rule=Host(`api.example.com`)' api-green
docker container update --label-add 'traefik.http.services.api.loadbalancer.server.port=3000' api-green
# Traefik picks up the change within seconds.
```

Note: `docker container update` for labels works only on Swarm-mode services; for plain `docker run`, you might recreate the container with updated labels. Many teams use Swarm services or K8s for this reason.

### Rollback playbook

```bash
#!/bin/bash
# rollback.sh — execute when post-cutover monitoring shows trouble

set -e

# Restore blue's labels (or revert nginx config)
sed -i 's/api-green:3000/api-blue:3000/' /etc/nginx/conf.d/default.conf
nginx -t && nginx -s reload

# Optionally, stop green (or leave running for forensics)
# docker stop api-green

echo "Rolled back to blue. Investigate green container: docker logs api-green"
```

Fast rollback. Prepared in advance, runs in seconds. If you cannot run this from muscle memory, you do not really have blue-green.

Markdown · drag & drop images · ⌘B / ⌘I shortcuts1699 words

For the reviewer

Note to the moderator (optional)

Visible only to the moderator. Helps review go faster.