Suggest an edit

Improve this article

Refine the answer for “How to perform a rolling update in Docker Swarm?”. Your changes go to moderation before they’re published.

Approval required

Content

What you’re changing

Title (EN)

Short answer (EN)

Shown above the full answer for quick recall.

Answer (EN)

**Rolling updates in Docker Swarm** replace running tasks one batch at a time, waiting for health between batches. The Swarm built-in machinery is genuinely good at this, with auto-rollback on failure as a first-class feature.

## Theory

### TL;DR

- `docker service update` with `--image` is the trigger. Swarm replaces tasks per `update_config` policy.
- **Key parameters:** parallelism (how many at a time), delay (between batches), monitor (how long to watch each batch), failure-action (continue/pause/rollback).
- **Order:** `stop-first` (default, brief gap per task) or `start-first` (zero downtime if app supports concurrent old/new).
- **Rollback** is a single command (`docker service rollback`) or automatic on failure.
- Healthcheck on the service is what makes "failure" detectable. Without it, Swarm assumes started=healthy.

### The update flow

```
Services: 6 replicas of api:1.0
--update-parallelism=2 --update-delay=30s

t=0:    [v1.0 v1.0 v1.0 v1.0 v1.0 v1.0]   issue update
t=0:    [STOP STOP v1.0 v1.0 v1.0 v1.0]   stop 2 (or start-first: extra v1.1 spawned)
t=5:    [v1.1 v1.1 v1.0 v1.0 v1.0 v1.0]   2 new tasks healthy
t=35:   [v1.1 v1.1 STOP STOP v1.0 v1.0]   delay+30s, next batch
t=40:   [v1.1 v1.1 v1.1 v1.1 v1.0 v1.0]
t=70:   [v1.1 v1.1 v1.1 v1.1 v1.1 v1.1]   done
```

During the update, traffic continues to whichever replicas are healthy.

### Imperative form (CLI)

```bash
docker service update \
    --image myorg/api:1.1 \
    --update-parallelism 1 \
    --update-delay 30s \
    --update-monitor 30s \
    --update-failure-action rollback \
    --update-max-failure-ratio 0.2 \
    --update-order start-first \
    api
```

What each flag does:
- `--update-parallelism N` — replace N tasks at a time (default 1).
- `--update-delay 30s` — wait between batches.
- `--update-monitor 30s` — watch each batch for failures for this long.
- `--update-failure-action <continue|pause|rollback>` — what to do on failure.
- `--update-max-failure-ratio 0.2` — at most 20% of tasks can fail before triggering action.
- `--update-order <stop-first|start-first>` — replace by stopping first, or starting new first.

### Declarative form (stack file)

```yaml
version: '3.9'
services:
  api:
    image: myorg/api:1.0
    deploy:
      replicas: 6
      update_config:
        parallelism: 1
        delay: 30s
        order: start-first
        failure_action: rollback
        monitor: 30s
        max_failure_ratio: 0.2
      rollback_config:
        parallelism: 2
        delay: 5s
        failure_action: pause
```

```bash
docker stack deploy -c stack.yaml mystack
# Edit image to 1.1, redeploy → triggers rolling update with config above.
```

The stack file is the canonical place — version-controlled, reviewable.

### Health-driven gating

Swarm decides "is this batch healthy?" by:
1. Container started successfully (no exit during monitor period).
2. If `healthcheck` is defined, container is `healthy`.
3. No more than `max_failure_ratio` failures in the batch.

Without a healthcheck, Swarm only knows "the process started". An app that starts but immediately misbehaves still counts as "healthy" to Swarm. Healthchecks are essential for safe rolling updates.

### Rollback

```bash
# Manual rollback at any time
docker service rollback api
# Reverts to the previous image tag
```

Or via `failure_action: rollback`, Swarm rolls back automatically when failure-ratio is exceeded. Combined with `monitor`, you get "if 1 of 5 in the new batch is unhealthy after 30 seconds, roll back the whole service" semantics.

### `start-first` vs `stop-first`

```yaml
order: stop-first    # default — slight gap per task
order: start-first   # spin up new alongside old, then drain old
```

`start-first` is the path to true zero-downtime, but requires the app to tolerate brief overlap (two versions running, briefly). For stateless web/API, fine. For workers with strict singleton semantics, may need code changes.

### Common mistakes

**Updating without a healthcheck**

```yaml
services:
  api:
    image: myorg/api
    # NO healthcheck → Swarm cannot detect bad versions
```

Without healthcheck, a broken new image rolls out to all replicas before failure becomes visible. Add `healthcheck:` to make Swarm gate progression on actual app readiness.

**Setting parallelism too high**

```yaml
update_config:
  parallelism: 5    # all 6 replicas at once
```

During the brief replacement window, you have very few healthy tasks. A spike in load = pile-up. Lower parallelism = safer.

**Forgetting rollback_config**

The rollback uses its own configuration block. If you only set `update_config`, rollback uses defaults (often slower than you want). Define `rollback_config` explicitly.

**Image tag still `latest` for `--rollback`**

```bash
docker service rollback api
No previous image to roll back to: same tag
```

If both new and old were tagged `latest`, Swarm cannot distinguish them. Always tag with a version (or commit SHA) so rollback works.

### Real-world usage

- **Production deploys on Swarm clusters** — every new image triggers `service update`; Swarm handles parallelism + monitoring.
- **Staged canary** — first deploy 1 of 10 with `parallelism=1` and a long monitor; if it stabilizes, raise parallelism for the rest.
- **Hotfix rollouts** — `service update --image hotfix:1.0` with high parallelism (faster) and aggressive monitoring (catch failures fast).
- **Database migrations** — never with rolling update directly. Run a one-off migrator service first, then update app replicas.

### Follow-up questions

**Q:** What happens to in-flight requests during a task replacement?

**A:** Tasks scheduled for replacement get SIGTERM and the configured grace period (`stop_grace_period`). Apps should drain in-flight requests before exiting. Combined with the routing mesh, traffic is steered away from stopping tasks before SIGTERM.

**Q:** Can I update multiple services together?

**A:** Edit the stack file with new images for each, then `docker stack deploy -c stack.yaml mystack`. Each service updates independently per its own config; you do not get cross-service ordering.

**Q:** How is Swarm rolling update different from K8s rolling update?

**A:** Conceptually identical. K8s deployment: `maxSurge`, `maxUnavailable` ≈ Swarm's `parallelism` and order. K8s readiness probes ≈ Swarm healthchecks. Same model, different syntax.

**Q:** What is the difference between `update_config` and `rollback_config`?

**A:** `update_config` controls forward updates (1.0 → 1.1). `rollback_config` controls reverse updates (1.1 → 1.0). Often you want a slower, safer rollback than the forward update.

**Q:** (Senior) How would you design rolling-update parameters for a service that takes 90 seconds to warm up?

**A:** `start_period` in healthcheck = 120s (give warmup time before counting failures). `update-monitor` = 180s (wait long enough to see real failures emerge). `parallelism` = 1 (slow rollout, 90s warmup × replicas = total update time). `failure_action` = rollback. The pattern: monitor-period > start-period > observation needed for stability. Faster rollouts hide warmup-related failures; this conservative config catches them.

## Examples

### Production-quality rollout

```yaml
version: '3.9'
services:
  api:
    image: myorg/api:1.0
    deploy:
      replicas: 6
      update_config:
        parallelism: 2
        delay: 30s
        order: start-first
        failure_action: rollback
        monitor: 60s
        max_failure_ratio: 0.2
      rollback_config:
        parallelism: 2
        delay: 10s
      restart_policy:
        condition: any
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 30s
```

Deploy with new image:

```bash
sed -i 's/myorg\/api:1.0/myorg\/api:1.1/' stack.yaml
docker stack deploy -c stack.yaml mystack
docker service ps mystack_api
# Watch tasks replace 2 at a time, with 30s gap, monitored for 60s each.
```

### Manual rollout with imperative flags

```bash
docker service update \
    --image myorg/api:1.1 \
    --update-parallelism 1 \
    --update-delay 60s \
    --update-monitor 120s \
    --update-failure-action rollback \
    --update-max-failure-ratio 0.0 \
    --update-order start-first \
    api
# Strict: any failure triggers rollback.
```

Useful for one-off tightly-controlled rollouts.

### Watching a rollout

```bash
$ watch -n 2 'docker service ps mystack_api --format "table {{.Name}}\t{{.Image}}\t{{.CurrentState}}"'
# Live view of which tasks are which version, in which state.
```

Great for verifying that the rollout is progressing as expected.

Markdown · drag & drop images · ⌘B / ⌘I shortcuts1186 words

For the reviewer

Note to the moderator (optional)

Visible only to the moderator. Helps review go faster.