Suggest an editImprove this articleRefine the answer for “How to implement blue-green deployment with Docker?”. Your changes go to moderation before they’re published.Approval requiredContentWhat you’re changing🇺🇸EN🇺🇦UAPreviewTitle (EN)Short answer (EN)**Blue-green** runs two complete environments. Blue serves production traffic; green is the new version, idle. After health-checks pass, flip the load balancer from blue → green. Old blue stays as instant rollback target. ```bash # Blue is running docker run -d --name api-blue --network proxy myorg/api:1.0 # Bring up green docker run -d --name api-green --network proxy myorg/api:1.1 # Verify green health, then update reverse proxy to route to api-green # Drain blue, then docker stop api-blue ``` **Key:** instant cutover, instant rollback (flip back to blue). Cost: 2x resources during deploy. Best for stateless services. State (DB) needs separate handling — schema must be backward-compatible during the swap window.Shown above the full answer for quick recall.Answer (EN)Image**Blue-green deployment** is a deploy strategy where two identical environments coexist briefly during release. The old (blue) serves traffic while the new (green) warms up; once green is healthy, the load balancer flips, and blue becomes the rollback target. With Docker, this is straightforward to implement. ## Theory ### TL;DR - Two complete environments: **blue** (current) and **green** (new). Both fully running, only one serves traffic. - A reverse proxy / load balancer routes traffic to whichever is "live". - **Cutover is atomic** — flip the routing config and the entire system switches to the new version. - **Rollback is atomic** — flip the routing back if the new version misbehaves. - **Cost:** 2x resources during the deploy window. - **Best for stateless apps.** State (DBs) needs special care because both versions might briefly run against the same data. ### Visual flow ``` Before deploy: During deploy (both running): Traffic Traffic | | v v +---------+ +---------+ | router | | blue | ← traffic +---------+ | v1.0 | | | +---------+ v v (only blue receives until cutover) +-----+ +------+ |blue | |green | |v1.0 | |v1.1 | (warming up, healthcheck running) +-----+ +------+ After cutover: After drain & cleanup: Traffic | Traffic v | +---------+ v | router | ──→ green +---------+ +---------+ | green | ← traffic | | | v1.1 | v v +---------+ +-----+ +------+ (blue removed) |blue | |green | |v1.0 | |v1.1 | ← traffic +-----+ +------+ ``` ### Implementation with Docker + reverse proxy #### Setup ```bash # A shared network for proxy and apps docker network create proxy ``` #### Step 1: blue is running ```bash docker run -d --name api-blue \ --network proxy \ --restart unless-stopped \ myorg/api:1.0 ``` Reverse proxy (Traefik, nginx, Caddy) routes traffic to `api-blue`. ``` # nginx upstream upstream api_backend { server api-blue:3000; } ``` #### Step 2: bring up green ```bash docker run -d --name api-green \ --network proxy \ --restart unless-stopped \ --health-cmd='curl -f http://localhost:3000/health' \ --health-interval=5s \ myorg/api:1.1 ``` Green is up but no traffic. Wait for healthcheck: ```bash docker inspect api-green --format '{{.State.Health.Status}}' # wait until: healthy ``` #### Step 3: smoke-test green out-of-band Before flipping traffic, verify green works directly: ```bash docker run --rm --network proxy curlimages/curl \ curl -f http://api-green:3000/health docker run --rm --network proxy curlimages/curl \ curl -f http://api-green:3000/api/v1/test ``` If green misbehaves, you have not yet impacted production. Fix or abandon green. #### Step 4: cutover Update the reverse proxy to route to green: ``` upstream api_backend { server api-green:3000; } ``` Reload nginx (`nginx -s reload`) or trigger Traefik to update via labels. The cutover is near-instant; any in-flight requests on blue continue (graceful), new requests go to green. #### Step 5: monitor Watch metrics, error rates, app logs. If trouble: ``` upstream api_backend { server api-blue:3000; # ← rollback } ``` Reload. Traffic flows to blue again. Total rollback time: seconds. #### Step 6: drain & cleanup If green is good after monitoring window: ```bash docker stop api-blue && docker rm api-blue ``` Rename for next deploy: ```bash docker rename api-green api-blue ``` Or, more commonly, the next release becomes the new "green" and the cycle continues. ### With Traefik (auto-routing via labels) ```yaml # compose.yaml — initial state services: traefik: image: traefik:v3 command: - --providers.docker - --entrypoints.web.address=:80 ports: ["80:80"] volumes: ["/var/run/docker.sock:/var/run/docker.sock:ro"] api-blue: image: myorg/api:1.0 labels: - "traefik.enable=true" - "traefik.http.routers.api.rule=Host(`api.example.com`)" - "traefik.http.services.api.loadbalancer.server.port=3000" ``` When deploying green: ```bash # Bring up green WITHOUT the traefik labels (no traffic yet) docker run -d --name api-green myorg/api:1.1 # Or with labels but on a different host rule # After green is healthy, swap labels: # Remove labels from blue # Add labels to green # Traefik picks up the change in seconds ``` Label-driven Traefik makes the cutover declarative. ### State management — the hard part Blue-green is easy for stateless services. State adds complexity: **Database schema:** - During the cutover window, both blue and green might query the DB. - The schema must be **backward compatible** with both versions. - Pattern: **expand-then-contract**. 1. Deploy expand migration (new column, new table). Old code still works. 2. Deploy green code that uses the new schema. Both versions coexist briefly during cutover. 3. After cutover and confirmation, deploy contract migration (drop old column). - Never breaking-change schema during a blue-green deploy. **Sessions:** - If sessions are in-memory, blue's sessions disappear on cutover. - Use externalized sessions (Redis, JWTs, signed cookies). Then both versions can serve any session. **File uploads:** - Same volume mounted into blue and green; both can read/write. - Or use object storage (S3) — both versions point at the same bucket. **Caches:** - New version's deserializers must understand old version's cache entries (or namespace by version). - Or invalidate the cache as part of cutover (causing brief slowness, not breakage). ### Traffic shifting variants Blue-green is binary: 100% blue or 100% green. Variants: - **Canary:** route a small % (5%) to green; if metrics good, raise to 50%, 100%. More gradual; lets you catch slow regressions. - **A/B testing:** route by user attribute (cookie, header) instead of percentage. Useful for feature comparison. - **Rolling:** replace tasks one at a time (Swarm/K8s default). Less resource cost, less atomic. Blue-green is the simplest and most atomic; canary catches more issues; rolling is cheapest. Most teams use a combination. ### Common mistakes **Forgetting the rollback test** Weeks pass; nobody actually verified that flipping the routing back works. Test it: deploy a no-op green, flip, flip back, confirm. **Schema-breaking changes during cutover** ```sql -- Migration that runs during cutover ALTER TABLE users DROP COLUMN old_field; ``` Old blue still queries `old_field` until traffic flips. Result: errors during the brief overlap. Fix: expand-then-contract pattern. **State that does not survive cutover** In-memory caches, in-memory sessions, in-flight WebSockets. Plan how each survives. Sessions externalize; WebSockets get drained gracefully (connection migration is a hard problem). **Insufficient monitoring during cutover** If you flip and look away, you do not know whether green is healthy under real load. Watch error rates, latency, throughput in real time during the first 5-10 minutes. **No automated cutover** Manual reverse-proxy edits are error-prone. Use `traefik` labels, `consul-template`, or a CI script that does the routing change atomically. ### Real-world usage - **Stateless web/API services:** ideal use case. Most teams that do blue-green do it here. - **Single-host Compose deploys:** swap via Traefik labels or nginx upstream. - **Swarm-based deploys:** combine blue-green with Swarm services labeled by color, plus Traefik or HAProxy routing. - **Kubernetes:** Service `selector` swap; or use specialized tooling (Argo Rollouts, Flagger) for blue-green and canary. ### Follow-up questions **Q:** How long should I wait between bringing up green and cutting over? **A:** Long enough for healthcheck + smoke-test + warmup. Apps with cold caches may need minutes of synthetic traffic before they perform at production levels. "Healthy" is necessary but not sufficient. **Q:** What about long-running connections (WebSockets, gRPC streams)? **A:** They live on blue until reconnect. Newly-opened connections go to green. Most apps drain naturally over minutes. For critical persistent connections, plan a maintenance window or implement reconnect logic. **Q:** Is blue-green better than canary? **A:** Different goals. Blue-green: atomic cutover, easier to reason about, instant rollback. Canary: gradual exposure, catches slow-burn regressions, more nuanced. Most mature teams do canary for big releases, blue-green for fast iterations. **Q:** How do I do blue-green with database schema changes? **A:** Expand-then-contract. (1) Deploy schema migration that adds new structure without removing old (expand). (2) Deploy app code (green) that uses both old and new. (3) After green is stable, deploy schema migration that removes old (contract). Three releases for one logical change, but each is safe. **Q:** (Senior) How do you handle a partial blue-green where the database is split across two deploys? **A:** Use feature flags + careful schema management. The code path that uses the new schema is gated by a flag; both blue and green have the new code, but only green has the flag enabled. Deploy schema migration first (expand). Deploy both blue and green with the new code, flag off. Cutover to green. Enable flag (perhaps gradually via percentage). If issues, disable flag (rollback without re-deploy). Eventually, remove old schema and code paths. Decouples deploy from rollout. ## Examples ### Single-host with nginx ```bash # State: api-blue running, nginx routing to it # Bring up green docker run -d --name api-green \ --network proxy \ --restart unless-stopped \ --health-cmd='wget -q --spider http://localhost:3000/health' \ myorg/api:1.1 # Wait for healthy while [[ "$(docker inspect api-green --format '{{.State.Health.Status}}')" != "healthy" ]]; do sleep 2 done # Smoke test docker run --rm --network proxy curlimages/curl curl -f http://api-green:3000/health # Update nginx config sed -i 's/api-blue:3000/api-green:3000/' /etc/nginx/conf.d/default.conf nginx -t && nginx -s reload # Monitor for 10 minutes sleep 600 # If error rate is normal, cleanup blue docker stop api-blue && docker rm api-blue docker rename api-green api-blue ``` ### Traefik label-swap pattern ```yaml # compose-blue.yaml — currently active services: api: image: myorg/api:1.0 labels: - "traefik.enable=true" - "traefik.http.routers.api.rule=Host(`api.example.com`)" - "traefik.http.services.api.loadbalancer.server.port=3000" container_name: api-blue ``` ```bash # Bring up green WITHOUT traefik labels docker run -d --name api-green --network proxy myorg/api:1.1 # (no labels yet, so Traefik does not route traffic to it) # Smoke-test green docker run --rm --network proxy curlimages/curl curl -f http://api-green:3000/health # Cutover: remove blue labels, add green labels docker container update --label-add 'traefik.enable=false' api-blue docker container update --label-add 'traefik.enable=true' api-green docker container update --label-add 'traefik.http.routers.api.rule=Host(`api.example.com`)' api-green docker container update --label-add 'traefik.http.services.api.loadbalancer.server.port=3000' api-green # Traefik picks up the change within seconds. ``` Note: `docker container update` for labels works only on Swarm-mode services; for plain `docker run`, you might recreate the container with updated labels. Many teams use Swarm services or K8s for this reason. ### Rollback playbook ```bash #!/bin/bash # rollback.sh — execute when post-cutover monitoring shows trouble set -e # Restore blue's labels (or revert nginx config) sed -i 's/api-green:3000/api-blue:3000/' /etc/nginx/conf.d/default.conf nginx -t && nginx -s reload # Optionally, stop green (or leave running for forensics) # docker stop api-green echo "Rolled back to blue. Investigate green container: docker logs api-green" ``` Fast rollback. Prepared in advance, runs in seconds. If you cannot run this from muscle memory, you do not really have blue-green.For the reviewerNote to the moderator (optional)Visible only to the moderator. Helps review go faster.