How to set up a health check for a Docker container?
Container health checks are how Docker (and Compose, Swarm, K8s) tell the difference between "the process is up" and "the app is actually working". Without a healthcheck, the only signal is "is PID 1 alive?", which misses every interesting failure mode.
Theory
TL;DR
- A healthcheck is a command Docker runs inside the container periodically. Exit 0 = healthy; non-zero = unhealthy.
- Three states: starting (still in
start_period), healthy, unhealthy (failedretriestimes in a row). - Set via
HEALTHCHECKin Dockerfile,--health-cmdondocker run, orhealthcheck:in Compose. - Used by
docker ps, by Composedepends_on: service_healthy, by Swarm to decide replica replacement. - Common command:
curl -f http://localhost:<port>/health. Picky details: command must exist inside the container.
Quick example
FROM node:22-alpine
WORKDIR /app
COPY . .
RUN npm ci --omit=dev
EXPOSE 3000
HEALTHCHECK \
CMD wget --quiet --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "server.js"]$ docker run -d --name api myapp
$ docker ps
CONTAINER ID IMAGE STATUS NAMES
a3f9d2b8c1e4 myapp Up 30 seconds (healthy) api
# Status now includes (healthy) / (unhealthy) / (starting)After ~30 seconds, the first check runs. If it succeeds, status flips to (healthy).
The four flags that matter
--interval=DURATION # how often to run the check (default 30s)
--timeout=DURATION # max time the check has to return (default 30s)
--retries=N # how many failures before unhealthy (default 3)
--start-period=DURATION # grace period at startup; failures here do not count (default 0s)For a typical web service: --interval=30s --timeout=3s --retries=3 --start-period=10s works well. Apps that take 30+ seconds to warm up (JVMs, big Python ML services) need a longer --start-period (60-120s).
Compose syntax
services:
api:
image: myapp
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 3s
retries: 3
start_period: 10s
web:
image: nginx
depends_on:
api:
condition: service_healthy # wait until api is healthy before startingdepends_on: condition: service_healthy is the killer feature — it waits for the dep's healthcheck to pass before starting dependents. Far more robust than the simple list form.
Three forms of the test command
# Form 1: CMD (preferred — no shell)
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
# Form 2: CMD-SHELL (with shell, lets you use && || env var expansion)
test: ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
# Form 3: NONE (disable inherited healthcheck from base image)
test: ["NONE"]CMD form is faster (no shell process). CMD-SHELL is needed for env var expansion or shell logic.
Common failure: command not found in container
Most-common health-check bug:
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1but the image is alpine without curl installed. Healthcheck always fails.
Solutions:
RUN apk add --no-cache curl(or for Alpine, oftenwgetis already there:wget -q --spider URL)- For Node apps:
node -e "require('http').get('http://localhost:3000/health', r => process.exit(r.statusCode===200?0:1))"(no extra package needed) - Distroless images often need a binary built in: include a
/healthcheckbinary in your Dockerfile that exits 0/1
Inspecting health
# Current status in ps
docker ps --format 'table {{.Names}}\t{{.Status}}'
# Full health details
docker inspect api --format '{{json .State.Health}}' | jq
# {
# "Status": "healthy",
# "FailingStreak": 0,
# "Log": [
# { "Start": "...", "End": "...", "ExitCode": 0, "Output": "..." },
# ...
# ]
# }
# Watch live
watch 'docker ps --format "table {{.Names}}\t{{.Status}}"'The Log array keeps the last 5 healthcheck results — invaluable for debugging "why is this unhealthy?".
Common mistakes
No start_period, healthchecks fail during slow startup
# WRONG: app needs 60s to warm up; first 3 fails make it unhealthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 5s
retries: 3
# After 15s the container is unhealthy and might get restarted
# RIGHT: start_period gives a grace window
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
start_period: 60s # failures during this window do not countHealthcheck that depends on a dependency
# WRONG: api healthcheck pings db; if db is down briefly, api goes unhealthy
test: ["CMD", "sh", "-c", "curl -f http://localhost:3000/health && pg_isready -h db"]Mix dependencies into your liveness check and a transient db blip restarts your api. Better: healthcheck only checks the container's own readiness; have a separate /readiness endpoint if your app needs to gate traffic on dependency health.
Hitting external URL in healthcheck
HEALTHCHECK CMD curl -f https://api.example.com/health || exit 1Now your container's health depends on someone else's uptime. Don't.
Disabling healthcheck without realizing it
FROM postgres:16
# inherits postgres' healthcheck. If your app does not exit 0 there, you get unhealthy.
# Either set your own:
HEALTHCHECK CMD pg_isready -U postgres
# Or disable inherited:
HEALTHCHECK NONEReal-world usage
- Compose with
depends_on: service_healthy: wait for db to be ready before starting api. The biggest practical use. - Swarm orchestration: unhealthy replicas are killed and replaced. Healthcheck is the signal.
- Reverse proxy integration: Traefik and nginx-proxy can inspect Docker healthcheck state to route only to healthy containers.
- Monitoring dashboards: scrape
docker inspectoutput for health status, alert onunhealthy.
Follow-up questions
Q: What is the difference between Docker healthcheck and Kubernetes liveness/readiness probes?
A: Same idea, different scope. K8s splits liveness (am I alive?) and readiness (am I ready for traffic?), with separate behaviors (liveness restarts; readiness removes from service). Docker has just one combined healthcheck. K8s does not use Docker's healthcheck — it has its own.
Q: What signal does an unhealthy container get?
A: None — being unhealthy does not auto-restart by itself. With --restart=on-failure it does not help (no exit code). With Swarm or Compose-with-orchestrator, the orchestrator decides to replace the unhealthy task. With plain docker run, you (or your monitor) act.
Q: Can I have multiple healthchecks?
A: Only one per container. Combine logic inside one CMD-SHELL command if needed.
Q: How do I disable the inherited healthcheck from a base image?
A: HEALTHCHECK NONE in your Dockerfile, or test: ["NONE"] in Compose.
Q: (Senior) When should the healthcheck do more than curl /health?
A: Add a smarter /health endpoint inside the app that checks downstream readiness (DB connection pool not exhausted, queue not backed up beyond a threshold). Keep the healthcheck command itself simple — the intelligence belongs in the endpoint, not in shell. For services with long warmup, expose /livez (always returns OK once the process is up) and /readyz (true app readiness) separately, and use a different K8s-style approach in Swarm via two endpoints.
Examples
Compose stack with healthcheck-gated startup
services:
api:
build: .
depends_on:
db:
condition: service_healthy
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://localhost:3000/health || exit 1"]
interval: 10s
timeout: 3s
retries: 3
start_period: 15s
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: dev
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
retries: 5
start_period: 5s$ docker compose up -d
[+] Running 2/2
✔ Container db Healthy 1.4s
✔ Container api Healthy 12.3sCompose waits for db to be healthy before starting api. No race conditions, no "connection refused" on first run.
Node app with built-in healthcheck (no extra packages)
FROM node:22-alpine
WORKDIR /app
COPY . .
RUN npm ci --omit=dev
EXPOSE 3000
HEALTHCHECK \
CMD node -e "require('http').get('http://localhost:3000/health', r => process.exit(r.statusCode === 200 ? 0 : 1))"
CMD ["node", "server.js"]No curl needed — uses the Node runtime that is already there. Smaller image, no extra package.
Watching health logs
$ docker inspect api --format '{{range .State.Health.Log}}{{.End}}: exit={{.ExitCode}} out={{.Output}}\n{{end}}'
2026-04-30T10:00:00Z: exit=0 out=ok
2026-04-30T10:00:30Z: exit=0 out=ok
2026-04-30T10:01:00Z: exit=1 out=connection refused
2026-04-30T10:01:30Z: exit=1 out=connection refused
2026-04-30T10:02:00Z: exit=0 out=okLast 5 results with exit codes and stdout. Often answers "why is/was this unhealthy?" without further investigation.
Short Answer
Interview readyA concise answer to help you respond confidently on this topic during an interview.
Comments
No comments yet