Skip to main content

How to debug problems in a Docker container?

Debugging Docker containers is a flowchart: start with the most general signal (logs, exit code), narrow to specific (exec inside, entrypoint override), use specialized tools when nothing else works. The right order saves you hours.

Theory

TL;DR

The debugging flowchart, in order:

  1. docker ps -a — what is the container's state?
  2. docker logs --tail 200 <name> — what did it print?
  3. docker inspect <name> — exit code, OOM flag, error message, mounts, networks.
  4. docker exec -it <name> sh — if running, poke around live.
  5. docker run -it --entrypoint sh <image> — if crashes too fast to attach.
  6. Override the broken thing: --entrypoint, --user root, --cap-add SYS_PTRACE for strace.
  7. Specialized tools: dive for layers, tcpdump for network, docker stats for resources.

Step 1: state

bash
$ docker ps -a --filter name=api CONTAINER ID IMAGE STATUS NAMES a3f9d2b8c1e4 myapp Exited (1) 3 seconds ago api

Three facts in one line: it crashed, exit code 1, 3 seconds ago. Most failure paths reveal themselves at this stage.

Step 2: logs

bash
# Recent output docker logs --tail 200 api # Live tail (for ongoing issues) docker logs -f --tail 50 api # Time-bounded docker logs --since 10m api # With timestamps docker logs -t --tail 50 api

Important: docker logs reads PID 1's stdout/stderr. If your app logs to a file inside the container, this will be empty. The fix is to make the app log to stdout (12-factor norm).

For verbose output capture: docker logs api > app.log 2>&1.

Step 3: inspect

The full JSON of the container's state. With --format you extract specific fields:

bash
# Status quartet — usually answers "what happened?" docker inspect api --format \ '{{.State.Status}} (exit={{.State.ExitCode}}) OOM={{.State.OOMKilled}} Err={{.State.Error}}' # exited (137) OOM=true Err= ← OOM killer # exited (1) OOM=false Err= ← app error code 1 # exited (139) OOM=false Err= ← segfault
bash
# What is mounted? docker inspect api --format '{{range .Mounts}}{{.Type}}: {{.Source}} -> {{.Destination}}\n{{end}}' # What network? docker inspect api --format '{{range $k, $v := .NetworkSettings.Networks}}{{$k}}: {{$v.IPAddress}}\n{{end}}' # Health check log (last 5 attempts) docker inspect api --format '{{json .State.Health}}'

Step 4: exec inside (if running)

bash
# Drop into a shell docker exec -it api sh # or bash if available docker exec -it api bash # Check files exist docker exec api ls -la /app # Check env docker exec api env | grep DATABASE # Check connectivity to a sibling docker exec api wget -O- http://db:5432 # Check what the process is doing docker exec api ps aux

Step 5: entrypoint override (if crashing too fast)

If the container exits before you can exec in:

bash
# Skip the actual app, drop into a shell docker run -it --rm --entrypoint sh myimage # Same but with the original env vars + volumes docker run -it --rm \ --entrypoint sh \ -e DATABASE_URL=... \ -v ./data:/data \ myimage # Or, run the original command but pause first to attach docker run -it --rm --entrypoint /bin/sh myimage -c "sleep 3600 & node server.js"

--entrypoint swaps out the image's entrypoint with whatever you provide. Combined with -it, you get an interactive prompt instead of the dying app.

Step 6: tactical overrides

bash
# Run as root for diagnosis (default user might lack permissions) docker exec -it -u root api sh # Mount strace from host or install in image docker run -it --cap-add=SYS_PTRACE myimage strace -p 1 # Disable healthcheck-driven restart loops docker run --health-cmd=NONE ... # Run with restart=no to keep failed state visible docker run --restart=no ... # Force IPv4 for DNS issues docker run --dns=8.8.8.8 ...

Distroless-specific debugging

Distroless images have no shell. To debug:

bash
# Most projects publish a :debug variant with busybox docker run -it --entrypoint sh gcr.io/distroless/base:debug # Or copy a busybox into your debugging container docker run -it --entrypoint /busybox/sh gcr.io/distroless/base:debug

For production distroless images: build a separate :debug tag with the same content + a busybox layer, deploy debug only when needed.

Common mistakes

Looking inside the writable layer for log files

bash
$ docker exec web cat /var/log/myapp.log # (often empty or stale; the app should log to stdout)

Apps that log to files inside the container are not visible to docker logs. Either reconfigure the app to use stdout, or mount a volume for logs and read from the host.

Running without -t and getting blank output

bash
$ docker exec api sh # (sometimes hangs or exits immediately)

Need -t to allocate a TTY. Always -it for interactive shells.

Forgetting that docker stop may have killed the app mid-cleanup

If an app dies during docker stop, exit 143 (SIGTERM) or 137 (SIGKILL after grace), the logs may be truncated. Increase --time on stop or trap SIGTERM in your app.

Mistaking docker-proxy errors for app errors

Error starting userland proxy: listen tcp 0.0.0.0:8080: bind: address already in use

That is from Docker, not your app — port 8080 is already used on the host. Check lsof -i :8080 or pick a different host port.

Specialized debugging tools

bash
# Image layer analysis dive myimage # interactive layer-by-layer view docker history --no-trunc myimage # who added what # Network debugging docker exec api tcpdump -i any -nn -c 20 docker exec api ss -tnlp # listening ports docker exec api ip route # Process tracing docker run --cap-add=SYS_PTRACE myimage strace -p 1 # Resource debugging docker stats --no-stream docker inspect --format '{{.HostConfig.Memory}}' container # Live filesystem changes docker diff <container> # what changed in the writable layer

Real-world usage

  • "My app exits immediately": check exit code → check logs → exec in with --entrypoint sh to verify the binary exists and is executable.
  • "It works locally but not in CI": compare env vars, mount paths, network. docker inspect is the rosetta stone.
  • "Container is unhealthy but I do not know why": docker inspect --format '{{json .State.Health}}' shows the last 5 healthcheck attempts with their output.
  • "Out of memory crashes": docker inspect for OOMKilled: true. Sized? Bumped recently? Memory leak? docker stats over time.
  • "Can't reach another container": exec in, ping by service name, check /etc/resolv.conf, verify both containers are on the same network.

Follow-up questions

Q: How do I get logs of a container that has been removed?


A: You cannot — docker rm deletes the container's log files. If you anticipate the need, configure a remote log driver (json-file log files survive container, but syslog/journald/fluentd ship logs offsite). Or always docker logs > backup.log before rm.

Q: What does exit code 125 mean?


A: Docker daemon error before the container started. Usually a Docker config issue (bad image, bad mount, port conflict). Look at the daemon log (journalctl -u docker).

Q: What does exit code 126 vs 127 mean?


A: 126 = command found but not executable (permissions or wrong arch). 127 = command not found at all. Often a typo or missing binary in the image.

Q: How do I debug a Compose service?


A: All the same commands work via docker compose: docker compose logs api, docker compose exec api sh, docker compose run --rm api sh for a one-off. The Compose wrappers know your project context.

Q: (Senior) How do you debug a container that crashes on the kubelet but works locally?


A: Reproduce the kubelet's exact run config: same image digest, same env, same entrypoint, same UID, same network policy. Pod spec → docker run flags is a known transformation. Often the difference is: K8s runs as a non-root UID by default, your local does not. Or K8s injects sidecars that block traffic. Or K8s mounts secrets/configmaps your local does not have. Use kubectl debug for an interactive container in the same pod context.

Examples

A complete debugging session

bash
$ docker ps -a --filter name=api STATUS NAMES Exited (137) 5 seconds ago api $ docker logs --tail 50 api ... lots of output ... ERROR: connect ECONNREFUSED 172.18.0.5:5432 $ docker inspect api --format '{{.State.OOMKilled}} {{.State.ExitCode}} {{.HostConfig.Memory}}' false 137 536870912 # OOM=false, exit 137 → not memory; SIGKILL came from somewhere $ docker logs --tail 50 db 2026-04-30 ... database system is shut down # db exited; api could not connect → SIGTERM cascaded. # Fix: add depends_on with healthcheck in compose, ensure db survives

Four commands triangulated the issue: api was killed because db went down first.

Debugging an image that crashes immediately

bash
$ docker run myimg # (exits in 0.1 seconds) $ docker run --rm myimg --version # (also exits immediately, no output) # Override entrypoint to investigate $ docker run -it --rm --entrypoint sh myimg / # which myapp /usr/local/bin/myapp / # ls -la /usr/local/bin/myapp -rwxr-xr-x 1 root root 12M ... / # /usr/local/bin/myapp Segmentation fault # → wrong arch! Likely an x86 binary in an ARM image. $ docker inspect --format '{{.Architecture}}' myimg amd64 $ uname -m aarch64 # Confirmed: image arch != host arch.

Without --entrypoint sh, the binary's segfault was invisible; docker logs had nothing because the process died before stdio.

Healthcheck debugging

bash
$ docker ps --format '{{.Names}}: {{.Status}}' api: Up 5 minutes (unhealthy) $ docker inspect api --format '{{range .State.Health.Log}}{{.End}}: exit={{.ExitCode}} out={{.Output}}\n{{end}}' 2026-04-30T10:00:00Z: exit=0 out=ok 2026-04-30T10:01:00Z: exit=0 out=ok 2026-04-30T10:02:00Z: exit=7 out=connect: connection refused 2026-04-30T10:03:00Z: exit=7 out=connect: connection refused 2026-04-30T10:04:00Z: exit=7 out=connect: connection refused # → health endpoint stopped responding 3 minutes ago. App probably hung. $ docker exec api ps aux # Look for the main process; is it running but not responsive? # If yes, app is stuck (deadlock, infinite loop). Capture a stack trace.

The healthcheck log is gold — five attempts with exit codes and output.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Comments

No comments yet