Suggest an editImprove this articleRefine the answer for “How to perform a rolling update in Docker Swarm?”. Your changes go to moderation before they’re published.Approval requiredContentWhat you’re changing🇺🇸EN🇺🇦UAPreviewTitle (EN)Short answer (EN)**`docker service update --image`** triggers a rolling update. Configure parallelism, delay, and failure action via `--update-*` flags or `update_config` in the stack file. ```bash docker service update \ --image myorg/api:1.1 \ --update-parallelism 1 \ --update-delay 10s \ --update-failure-action rollback \ --update-monitor 30s \ api ``` **Key:** Swarm replaces tasks one (or N) at a time. Health checks gate progression; failure can auto-rollback. Set `update-order start-first` to spin up the new task before tearing down the old (true zero-downtime if your app supports it).Shown above the full answer for quick recall.Answer (EN)Image**Rolling updates in Docker Swarm** replace running tasks one batch at a time, waiting for health between batches. The Swarm built-in machinery is genuinely good at this, with auto-rollback on failure as a first-class feature. ## Theory ### TL;DR - `docker service update` with `--image` is the trigger. Swarm replaces tasks per `update_config` policy. - **Key parameters:** parallelism (how many at a time), delay (between batches), monitor (how long to watch each batch), failure-action (continue/pause/rollback). - **Order:** `stop-first` (default, brief gap per task) or `start-first` (zero downtime if app supports concurrent old/new). - **Rollback** is a single command (`docker service rollback`) or automatic on failure. - Healthcheck on the service is what makes "failure" detectable. Without it, Swarm assumes started=healthy. ### The update flow ``` Services: 6 replicas of api:1.0 --update-parallelism=2 --update-delay=30s t=0: [v1.0 v1.0 v1.0 v1.0 v1.0 v1.0] issue update t=0: [STOP STOP v1.0 v1.0 v1.0 v1.0] stop 2 (or start-first: extra v1.1 spawned) t=5: [v1.1 v1.1 v1.0 v1.0 v1.0 v1.0] 2 new tasks healthy t=35: [v1.1 v1.1 STOP STOP v1.0 v1.0] delay+30s, next batch t=40: [v1.1 v1.1 v1.1 v1.1 v1.0 v1.0] t=70: [v1.1 v1.1 v1.1 v1.1 v1.1 v1.1] done ``` During the update, traffic continues to whichever replicas are healthy. ### Imperative form (CLI) ```bash docker service update \ --image myorg/api:1.1 \ --update-parallelism 1 \ --update-delay 30s \ --update-monitor 30s \ --update-failure-action rollback \ --update-max-failure-ratio 0.2 \ --update-order start-first \ api ``` What each flag does: - `--update-parallelism N` — replace N tasks at a time (default 1). - `--update-delay 30s` — wait between batches. - `--update-monitor 30s` — watch each batch for failures for this long. - `--update-failure-action <continue|pause|rollback>` — what to do on failure. - `--update-max-failure-ratio 0.2` — at most 20% of tasks can fail before triggering action. - `--update-order <stop-first|start-first>` — replace by stopping first, or starting new first. ### Declarative form (stack file) ```yaml version: '3.9' services: api: image: myorg/api:1.0 deploy: replicas: 6 update_config: parallelism: 1 delay: 30s order: start-first failure_action: rollback monitor: 30s max_failure_ratio: 0.2 rollback_config: parallelism: 2 delay: 5s failure_action: pause ``` ```bash docker stack deploy -c stack.yaml mystack # Edit image to 1.1, redeploy → triggers rolling update with config above. ``` The stack file is the canonical place — version-controlled, reviewable. ### Health-driven gating Swarm decides "is this batch healthy?" by: 1. Container started successfully (no exit during monitor period). 2. If `healthcheck` is defined, container is `healthy`. 3. No more than `max_failure_ratio` failures in the batch. Without a healthcheck, Swarm only knows "the process started". An app that starts but immediately misbehaves still counts as "healthy" to Swarm. Healthchecks are essential for safe rolling updates. ### Rollback ```bash # Manual rollback at any time docker service rollback api # Reverts to the previous image tag ``` Or via `failure_action: rollback`, Swarm rolls back automatically when failure-ratio is exceeded. Combined with `monitor`, you get "if 1 of 5 in the new batch is unhealthy after 30 seconds, roll back the whole service" semantics. ### `start-first` vs `stop-first` ```yaml order: stop-first # default — slight gap per task order: start-first # spin up new alongside old, then drain old ``` `start-first` is the path to true zero-downtime, but requires the app to tolerate brief overlap (two versions running, briefly). For stateless web/API, fine. For workers with strict singleton semantics, may need code changes. ### Common mistakes **Updating without a healthcheck** ```yaml services: api: image: myorg/api # NO healthcheck → Swarm cannot detect bad versions ``` Without healthcheck, a broken new image rolls out to all replicas before failure becomes visible. Add `healthcheck:` to make Swarm gate progression on actual app readiness. **Setting parallelism too high** ```yaml update_config: parallelism: 5 # all 6 replicas at once ``` During the brief replacement window, you have very few healthy tasks. A spike in load = pile-up. Lower parallelism = safer. **Forgetting rollback_config** The rollback uses its own configuration block. If you only set `update_config`, rollback uses defaults (often slower than you want). Define `rollback_config` explicitly. **Image tag still `latest` for `--rollback`** ```bash docker service rollback api No previous image to roll back to: same tag ``` If both new and old were tagged `latest`, Swarm cannot distinguish them. Always tag with a version (or commit SHA) so rollback works. ### Real-world usage - **Production deploys on Swarm clusters** — every new image triggers `service update`; Swarm handles parallelism + monitoring. - **Staged canary** — first deploy 1 of 10 with `parallelism=1` and a long monitor; if it stabilizes, raise parallelism for the rest. - **Hotfix rollouts** — `service update --image hotfix:1.0` with high parallelism (faster) and aggressive monitoring (catch failures fast). - **Database migrations** — never with rolling update directly. Run a one-off migrator service first, then update app replicas. ### Follow-up questions **Q:** What happens to in-flight requests during a task replacement? **A:** Tasks scheduled for replacement get SIGTERM and the configured grace period (`stop_grace_period`). Apps should drain in-flight requests before exiting. Combined with the routing mesh, traffic is steered away from stopping tasks before SIGTERM. **Q:** Can I update multiple services together? **A:** Edit the stack file with new images for each, then `docker stack deploy -c stack.yaml mystack`. Each service updates independently per its own config; you do not get cross-service ordering. **Q:** How is Swarm rolling update different from K8s rolling update? **A:** Conceptually identical. K8s deployment: `maxSurge`, `maxUnavailable` ≈ Swarm's `parallelism` and order. K8s readiness probes ≈ Swarm healthchecks. Same model, different syntax. **Q:** What is the difference between `update_config` and `rollback_config`? **A:** `update_config` controls forward updates (1.0 → 1.1). `rollback_config` controls reverse updates (1.1 → 1.0). Often you want a slower, safer rollback than the forward update. **Q:** (Senior) How would you design rolling-update parameters for a service that takes 90 seconds to warm up? **A:** `start_period` in healthcheck = 120s (give warmup time before counting failures). `update-monitor` = 180s (wait long enough to see real failures emerge). `parallelism` = 1 (slow rollout, 90s warmup × replicas = total update time). `failure_action` = rollback. The pattern: monitor-period > start-period > observation needed for stability. Faster rollouts hide warmup-related failures; this conservative config catches them. ## Examples ### Production-quality rollout ```yaml version: '3.9' services: api: image: myorg/api:1.0 deploy: replicas: 6 update_config: parallelism: 2 delay: 30s order: start-first failure_action: rollback monitor: 60s max_failure_ratio: 0.2 rollback_config: parallelism: 2 delay: 10s restart_policy: condition: any healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 10s timeout: 3s retries: 3 start_period: 30s ``` Deploy with new image: ```bash sed -i 's/myorg\/api:1.0/myorg\/api:1.1/' stack.yaml docker stack deploy -c stack.yaml mystack docker service ps mystack_api # Watch tasks replace 2 at a time, with 30s gap, monitored for 60s each. ``` ### Manual rollout with imperative flags ```bash docker service update \ --image myorg/api:1.1 \ --update-parallelism 1 \ --update-delay 60s \ --update-monitor 120s \ --update-failure-action rollback \ --update-max-failure-ratio 0.0 \ --update-order start-first \ api # Strict: any failure triggers rollback. ``` Useful for one-off tightly-controlled rollouts. ### Watching a rollout ```bash $ watch -n 2 'docker service ps mystack_api --format "table {{.Name}}\t{{.Image}}\t{{.CurrentState}}"' # Live view of which tasks are which version, in which state. ``` Great for verifying that the rollout is progressing as expected.For the reviewerNote to the moderator (optional)Visible only to the moderator. Helps review go faster.