Suggest an edit

Improve this article

Refine the answer for “How to set up a health check for a Docker container?”. Your changes go to moderation before they’re published.

Approval required

Content

What you’re changing

Title (EN)

Short answer (EN)

Shown above the full answer for quick recall.

Answer (EN)

**Container health checks** are how Docker (and Compose, Swarm, K8s) tell the difference between "the process is up" and "the app is actually working". Without a healthcheck, the only signal is "is PID 1 alive?", which misses every interesting failure mode.

## Theory

### TL;DR

- A healthcheck is a command Docker runs inside the container periodically. Exit 0 = healthy; non-zero = unhealthy.
- Three states: **starting** (still in `start_period`), **healthy**, **unhealthy** (failed `retries` times in a row).
- Set via `HEALTHCHECK` in Dockerfile, `--health-cmd` on `docker run`, or `healthcheck:` in Compose.
- Used by `docker ps`, by Compose `depends_on: service_healthy`, by Swarm to decide replica replacement.
- Common command: `curl -f http://localhost:<port>/health`. Picky details: command must exist inside the container.

### Quick example

```dockerfile
FROM node:22-alpine
WORKDIR /app
COPY . .
RUN npm ci --omit=dev
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
  CMD wget --quiet --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "server.js"]
```

```bash
$ docker run -d --name api myapp
$ docker ps
CONTAINER ID   IMAGE   STATUS                              NAMES
a3f9d2b8c1e4   myapp   Up 30 seconds (healthy)             api
# Status now includes (healthy) / (unhealthy) / (starting)
```

After ~30 seconds, the first check runs. If it succeeds, status flips to `(healthy)`.

### The four flags that matter

```
--interval=DURATION    # how often to run the check (default 30s)
--timeout=DURATION     # max time the check has to return (default 30s)
--retries=N            # how many failures before unhealthy (default 3)
--start-period=DURATION # grace period at startup; failures here do not count (default 0s)
```

For a typical web service: `--interval=30s --timeout=3s --retries=3 --start-period=10s` works well. Apps that take 30+ seconds to warm up (JVMs, big Python ML services) need a longer `--start-period` (60-120s).

### Compose syntax

```yaml
services:
  api:
    image: myapp
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 3s
      retries: 3
      start_period: 10s
  web:
    image: nginx
    depends_on:
      api:
        condition: service_healthy   # wait until api is healthy before starting
```

`depends_on: condition: service_healthy` is the killer feature — it waits for the dep's healthcheck to pass before starting dependents. Far more robust than the simple list form.

### Three forms of the test command

```yaml
# Form 1: CMD (preferred — no shell)
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]

# Form 2: CMD-SHELL (with shell, lets you use && || env var expansion)
test: ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]

# Form 3: NONE (disable inherited healthcheck from base image)
test: ["NONE"]
```

`CMD` form is faster (no shell process). `CMD-SHELL` is needed for env var expansion or shell logic.

### Common failure: command not found in container

Most-common health-check bug:

```dockerfile
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1
```

but the image is `alpine` without `curl` installed. Healthcheck always fails.

Solutions:
- `RUN apk add --no-cache curl` (or for Alpine, often `wget` is already there: `wget -q --spider URL`)
- For Node apps: `node -e "require('http').get('http://localhost:3000/health', r => process.exit(r.statusCode===200?0:1))"` (no extra package needed)
- Distroless images often need a binary built in: include a `/healthcheck` binary in your Dockerfile that exits 0/1

### Inspecting health

```bash
# Current status in ps
docker ps --format 'table {{.Names}}\t{{.Status}}'

# Full health details
docker inspect api --format '{{json .State.Health}}' | jq
# {
#   "Status": "healthy",
#   "FailingStreak": 0,
#   "Log": [
#     { "Start": "...", "End": "...", "ExitCode": 0, "Output": "..." },
#     ...
#   ]
# }

# Watch live
watch 'docker ps --format "table {{.Names}}\t{{.Status}}"'
```

The `Log` array keeps the last 5 healthcheck results — invaluable for debugging "why is this unhealthy?".

### Common mistakes

**No `start_period`, healthchecks fail during slow startup**

```yaml
# WRONG: app needs 60s to warm up; first 3 fails make it unhealthy
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
  interval: 5s
  retries: 3
# After 15s the container is unhealthy and might get restarted

# RIGHT: start_period gives a grace window
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
  interval: 30s
  start_period: 60s    # failures during this window do not count
```

**Healthcheck that depends on a dependency**

```yaml
# WRONG: api healthcheck pings db; if db is down briefly, api goes unhealthy
test: ["CMD", "sh", "-c", "curl -f http://localhost:3000/health && pg_isready -h db"]
```

Mix dependencies into your *liveness* check and a transient db blip restarts your api. Better: healthcheck only checks the container's own readiness; have a separate /readiness endpoint if your app needs to gate traffic on dependency health.

**Hitting external URL in healthcheck**

```dockerfile
HEALTHCHECK CMD curl -f https://api.example.com/health || exit 1
```

Now your container's health depends on someone else's uptime. Don't.

**Disabling healthcheck without realizing it**

```dockerfile
FROM postgres:16
# inherits postgres' healthcheck. If your app does not exit 0 there, you get unhealthy.

# Either set your own:
HEALTHCHECK CMD pg_isready -U postgres
# Or disable inherited:
HEALTHCHECK NONE
```

### Real-world usage

- **Compose with `depends_on: service_healthy`:** wait for db to be ready before starting api. The biggest practical use.
- **Swarm orchestration:** unhealthy replicas are killed and replaced. Healthcheck is the signal.
- **Reverse proxy integration:** Traefik and nginx-proxy can inspect Docker healthcheck state to route only to healthy containers.
- **Monitoring dashboards:** scrape `docker inspect` output for health status, alert on `unhealthy`.

### Follow-up questions

**Q:** What is the difference between Docker healthcheck and Kubernetes liveness/readiness probes?

**A:** Same idea, different scope. K8s splits liveness (am I alive?) and readiness (am I ready for traffic?), with separate behaviors (liveness restarts; readiness removes from service). Docker has just one combined healthcheck. K8s does not use Docker's healthcheck — it has its own.

**Q:** What signal does an unhealthy container get?

**A:** None — being unhealthy does not auto-restart by itself. With `--restart=on-failure` it does not help (no exit code). With Swarm or Compose-with-orchestrator, the orchestrator decides to replace the unhealthy task. With plain `docker run`, you (or your monitor) act.

**Q:** Can I have multiple healthchecks?

**A:** Only one per container. Combine logic inside one CMD-SHELL command if needed.

**Q:** How do I disable the inherited healthcheck from a base image?

**A:** `HEALTHCHECK NONE` in your Dockerfile, or `test: ["NONE"]` in Compose.

**Q:** (Senior) When should the healthcheck do more than `curl /health`?

**A:** Add a smarter `/health` endpoint inside the app that checks downstream readiness (DB connection pool not exhausted, queue not backed up beyond a threshold). Keep the healthcheck command itself simple — the intelligence belongs in the endpoint, not in shell. For services with long warmup, expose `/livez` (always returns OK once the process is up) and `/readyz` (true app readiness) separately, and use a different K8s-style approach in Swarm via two endpoints.

## Examples

### Compose stack with healthcheck-gated startup

```yaml
services:
  api:
    build: .
    depends_on:
      db:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "wget -q --spider http://localhost:3000/health || exit 1"]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 15s

db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: dev
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      retries: 5
      start_period: 5s
```

```bash
$ docker compose up -d
[+] Running 2/2
 ✔ Container db   Healthy   1.4s
 ✔ Container api  Healthy   12.3s
```

Compose waits for db to be healthy before starting api. No race conditions, no "connection refused" on first run.

### Node app with built-in healthcheck (no extra packages)

```dockerfile
FROM node:22-alpine
WORKDIR /app
COPY . .
RUN npm ci --omit=dev
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', r => process.exit(r.statusCode === 200 ? 0 : 1))"
CMD ["node", "server.js"]
```

No curl needed — uses the Node runtime that is already there. Smaller image, no extra package.

### Watching health logs

```bash
$ docker inspect api --format '{{range .State.Health.Log}}{{.End}}: exit={{.ExitCode}} out={{.Output}}\n{{end}}'
2026-04-30T10:00:00Z: exit=0 out=ok
2026-04-30T10:00:30Z: exit=0 out=ok
2026-04-30T10:01:00Z: exit=1 out=connection refused
2026-04-30T10:01:30Z: exit=1 out=connection refused
2026-04-30T10:02:00Z: exit=0 out=ok
```

Last 5 results with exit codes and stdout. Often answers "why is/was this unhealthy?" without further investigation.

Markdown · drag & drop images · ⌘B / ⌘I shortcuts1298 words

For the reviewer

Note to the moderator (optional)

Visible only to the moderator. Helps review go faster.