Skip to main content

How to set up a health check for a Docker container?

Container health checks are how Docker (and Compose, Swarm, K8s) tell the difference between "the process is up" and "the app is actually working". Without a healthcheck, the only signal is "is PID 1 alive?", which misses every interesting failure mode.

Theory

TL;DR

  • A healthcheck is a command Docker runs inside the container periodically. Exit 0 = healthy; non-zero = unhealthy.
  • Three states: starting (still in start_period), healthy, unhealthy (failed retries times in a row).
  • Set via HEALTHCHECK in Dockerfile, --health-cmd on docker run, or healthcheck: in Compose.
  • Used by docker ps, by Compose depends_on: service_healthy, by Swarm to decide replica replacement.
  • Common command: curl -f http://localhost:<port>/health. Picky details: command must exist inside the container.

Quick example

dockerfile
FROM node:22-alpine WORKDIR /app COPY . . RUN npm ci --omit=dev EXPOSE 3000 HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \ CMD wget --quiet --tries=1 --spider http://localhost:3000/health || exit 1 CMD ["node", "server.js"]
bash
$ docker run -d --name api myapp $ docker ps CONTAINER ID IMAGE STATUS NAMES a3f9d2b8c1e4 myapp Up 30 seconds (healthy) api # Status now includes (healthy) / (unhealthy) / (starting)

After ~30 seconds, the first check runs. If it succeeds, status flips to (healthy).

The four flags that matter

--interval=DURATION # how often to run the check (default 30s) --timeout=DURATION # max time the check has to return (default 30s) --retries=N # how many failures before unhealthy (default 3) --start-period=DURATION # grace period at startup; failures here do not count (default 0s)

For a typical web service: --interval=30s --timeout=3s --retries=3 --start-period=10s works well. Apps that take 30+ seconds to warm up (JVMs, big Python ML services) need a longer --start-period (60-120s).

Compose syntax

yaml
services: api: image: myapp healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 3s retries: 3 start_period: 10s web: image: nginx depends_on: api: condition: service_healthy # wait until api is healthy before starting

depends_on: condition: service_healthy is the killer feature — it waits for the dep's healthcheck to pass before starting dependents. Far more robust than the simple list form.

Three forms of the test command

yaml
# Form 1: CMD (preferred — no shell) test: ["CMD", "curl", "-f", "http://localhost:3000/health"] # Form 2: CMD-SHELL (with shell, lets you use && || env var expansion) test: ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"] # Form 3: NONE (disable inherited healthcheck from base image) test: ["NONE"]

CMD form is faster (no shell process). CMD-SHELL is needed for env var expansion or shell logic.

Common failure: command not found in container

Most-common health-check bug:

dockerfile
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1

but the image is alpine without curl installed. Healthcheck always fails.

Solutions:

  • RUN apk add --no-cache curl (or for Alpine, often wget is already there: wget -q --spider URL)
  • For Node apps: node -e "require('http').get('http://localhost:3000/health', r => process.exit(r.statusCode===200?0:1))" (no extra package needed)
  • Distroless images often need a binary built in: include a /healthcheck binary in your Dockerfile that exits 0/1

Inspecting health

bash
# Current status in ps docker ps --format 'table {{.Names}}\t{{.Status}}' # Full health details docker inspect api --format '{{json .State.Health}}' | jq # { # "Status": "healthy", # "FailingStreak": 0, # "Log": [ # { "Start": "...", "End": "...", "ExitCode": 0, "Output": "..." }, # ... # ] # } # Watch live watch 'docker ps --format "table {{.Names}}\t{{.Status}}"'

The Log array keeps the last 5 healthcheck results — invaluable for debugging "why is this unhealthy?".

Common mistakes

No start_period, healthchecks fail during slow startup

yaml
# WRONG: app needs 60s to warm up; first 3 fails make it unhealthy healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/health"] interval: 5s retries: 3 # After 15s the container is unhealthy and might get restarted # RIGHT: start_period gives a grace window healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/health"] interval: 30s start_period: 60s # failures during this window do not count

Healthcheck that depends on a dependency

yaml
# WRONG: api healthcheck pings db; if db is down briefly, api goes unhealthy test: ["CMD", "sh", "-c", "curl -f http://localhost:3000/health && pg_isready -h db"]

Mix dependencies into your liveness check and a transient db blip restarts your api. Better: healthcheck only checks the container's own readiness; have a separate /readiness endpoint if your app needs to gate traffic on dependency health.

Hitting external URL in healthcheck

dockerfile
HEALTHCHECK CMD curl -f https://api.example.com/health || exit 1

Now your container's health depends on someone else's uptime. Don't.

Disabling healthcheck without realizing it

dockerfile
FROM postgres:16 # inherits postgres' healthcheck. If your app does not exit 0 there, you get unhealthy. # Either set your own: HEALTHCHECK CMD pg_isready -U postgres # Or disable inherited: HEALTHCHECK NONE

Real-world usage

  • Compose with depends_on: service_healthy: wait for db to be ready before starting api. The biggest practical use.
  • Swarm orchestration: unhealthy replicas are killed and replaced. Healthcheck is the signal.
  • Reverse proxy integration: Traefik and nginx-proxy can inspect Docker healthcheck state to route only to healthy containers.
  • Monitoring dashboards: scrape docker inspect output for health status, alert on unhealthy.

Follow-up questions

Q: What is the difference between Docker healthcheck and Kubernetes liveness/readiness probes?


A: Same idea, different scope. K8s splits liveness (am I alive?) and readiness (am I ready for traffic?), with separate behaviors (liveness restarts; readiness removes from service). Docker has just one combined healthcheck. K8s does not use Docker's healthcheck — it has its own.

Q: What signal does an unhealthy container get?


A: None — being unhealthy does not auto-restart by itself. With --restart=on-failure it does not help (no exit code). With Swarm or Compose-with-orchestrator, the orchestrator decides to replace the unhealthy task. With plain docker run, you (or your monitor) act.

Q: Can I have multiple healthchecks?


A: Only one per container. Combine logic inside one CMD-SHELL command if needed.

Q: How do I disable the inherited healthcheck from a base image?


A: HEALTHCHECK NONE in your Dockerfile, or test: ["NONE"] in Compose.

Q: (Senior) When should the healthcheck do more than curl /health?


A: Add a smarter /health endpoint inside the app that checks downstream readiness (DB connection pool not exhausted, queue not backed up beyond a threshold). Keep the healthcheck command itself simple — the intelligence belongs in the endpoint, not in shell. For services with long warmup, expose /livez (always returns OK once the process is up) and /readyz (true app readiness) separately, and use a different K8s-style approach in Swarm via two endpoints.

Examples

Compose stack with healthcheck-gated startup

yaml
services: api: build: . depends_on: db: condition: service_healthy healthcheck: test: ["CMD-SHELL", "wget -q --spider http://localhost:3000/health || exit 1"] interval: 10s timeout: 3s retries: 3 start_period: 15s db: image: postgres:16 environment: POSTGRES_PASSWORD: dev healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres"] interval: 5s retries: 5 start_period: 5s
bash
$ docker compose up -d [+] Running 2/2 ✔ Container db Healthy 1.4s ✔ Container api Healthy 12.3s

Compose waits for db to be healthy before starting api. No race conditions, no "connection refused" on first run.

Node app with built-in healthcheck (no extra packages)

dockerfile
FROM node:22-alpine WORKDIR /app COPY . . RUN npm ci --omit=dev EXPOSE 3000 HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \ CMD node -e "require('http').get('http://localhost:3000/health', r => process.exit(r.statusCode === 200 ? 0 : 1))" CMD ["node", "server.js"]

No curl needed — uses the Node runtime that is already there. Smaller image, no extra package.

Watching health logs

bash
$ docker inspect api --format '{{range .State.Health.Log}}{{.End}}: exit={{.ExitCode}} out={{.Output}}\n{{end}}' 2026-04-30T10:00:00Z: exit=0 out=ok 2026-04-30T10:00:30Z: exit=0 out=ok 2026-04-30T10:01:00Z: exit=1 out=connection refused 2026-04-30T10:01:30Z: exit=1 out=connection refused 2026-04-30T10:02:00Z: exit=0 out=ok

Last 5 results with exit codes and stdout. Often answers "why is/was this unhealthy?" without further investigation.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Comments

No comments yet