How to debug problems in a Docker container?
Debugging Docker containers is a flowchart: start with the most general signal (logs, exit code), narrow to specific (exec inside, entrypoint override), use specialized tools when nothing else works. The right order saves you hours.
Theory
TL;DR
The debugging flowchart, in order:
docker ps -a— what is the container's state?docker logs --tail 200 <name>— what did it print?docker inspect <name>— exit code, OOM flag, error message, mounts, networks.docker exec -it <name> sh— if running, poke around live.docker run -it --entrypoint sh <image>— if crashes too fast to attach.- Override the broken thing:
--entrypoint,--user root,--cap-add SYS_PTRACEfor strace. - Specialized tools:
divefor layers,tcpdumpfor network,docker statsfor resources.
Step 1: state
$ docker ps -a --filter name=api
CONTAINER ID IMAGE STATUS NAMES
a3f9d2b8c1e4 myapp Exited (1) 3 seconds ago apiThree facts in one line: it crashed, exit code 1, 3 seconds ago. Most failure paths reveal themselves at this stage.
Step 2: logs
# Recent output
docker logs --tail 200 api
# Live tail (for ongoing issues)
docker logs -f --tail 50 api
# Time-bounded
docker logs --since 10m api
# With timestamps
docker logs -t --tail 50 apiImportant: docker logs reads PID 1's stdout/stderr. If your app logs to a file inside the container, this will be empty. The fix is to make the app log to stdout (12-factor norm).
For verbose output capture: docker logs api > app.log 2>&1.
Step 3: inspect
The full JSON of the container's state. With --format you extract specific fields:
# Status quartet — usually answers "what happened?"
docker inspect api --format \
'{{.State.Status}} (exit={{.State.ExitCode}}) OOM={{.State.OOMKilled}} Err={{.State.Error}}'
# exited (137) OOM=true Err= ← OOM killer
# exited (1) OOM=false Err= ← app error code 1
# exited (139) OOM=false Err= ← segfault# What is mounted?
docker inspect api --format '{{range .Mounts}}{{.Type}}: {{.Source}} -> {{.Destination}}\n{{end}}'
# What network?
docker inspect api --format '{{range $k, $v := .NetworkSettings.Networks}}{{$k}}: {{$v.IPAddress}}\n{{end}}'
# Health check log (last 5 attempts)
docker inspect api --format '{{json .State.Health}}'Step 4: exec inside (if running)
# Drop into a shell
docker exec -it api sh
# or bash if available
docker exec -it api bash
# Check files exist
docker exec api ls -la /app
# Check env
docker exec api env | grep DATABASE
# Check connectivity to a sibling
docker exec api wget -O- http://db:5432
# Check what the process is doing
docker exec api ps auxStep 5: entrypoint override (if crashing too fast)
If the container exits before you can exec in:
# Skip the actual app, drop into a shell
docker run -it --rm --entrypoint sh myimage
# Same but with the original env vars + volumes
docker run -it --rm \
--entrypoint sh \
-e DATABASE_URL=... \
-v ./data:/data \
myimage
# Or, run the original command but pause first to attach
docker run -it --rm --entrypoint /bin/sh myimage -c "sleep 3600 & node server.js"--entrypoint swaps out the image's entrypoint with whatever you provide. Combined with -it, you get an interactive prompt instead of the dying app.
Step 6: tactical overrides
# Run as root for diagnosis (default user might lack permissions)
docker exec -it -u root api sh
# Mount strace from host or install in image
docker run -it --cap-add=SYS_PTRACE myimage strace -p 1
# Disable healthcheck-driven restart loops
docker run --health-cmd=NONE ...
# Run with restart=no to keep failed state visible
docker run --restart=no ...
# Force IPv4 for DNS issues
docker run --dns=8.8.8.8 ...Distroless-specific debugging
Distroless images have no shell. To debug:
# Most projects publish a :debug variant with busybox
docker run -it --entrypoint sh gcr.io/distroless/base:debug
# Or copy a busybox into your debugging container
docker run -it --entrypoint /busybox/sh gcr.io/distroless/base:debugFor production distroless images: build a separate :debug tag with the same content + a busybox layer, deploy debug only when needed.
Common mistakes
Looking inside the writable layer for log files
$ docker exec web cat /var/log/myapp.log
# (often empty or stale; the app should log to stdout)Apps that log to files inside the container are not visible to docker logs. Either reconfigure the app to use stdout, or mount a volume for logs and read from the host.
Running without -t and getting blank output
$ docker exec api sh
# (sometimes hangs or exits immediately)Need -t to allocate a TTY. Always -it for interactive shells.
Forgetting that docker stop may have killed the app mid-cleanup
If an app dies during docker stop, exit 143 (SIGTERM) or 137 (SIGKILL after grace), the logs may be truncated. Increase --time on stop or trap SIGTERM in your app.
Mistaking docker-proxy errors for app errors
Error starting userland proxy: listen tcp 0.0.0.0:8080: bind: address already in use
That is from Docker, not your app — port 8080 is already used on the host. Check lsof -i :8080 or pick a different host port.
Specialized debugging tools
# Image layer analysis
dive myimage # interactive layer-by-layer view
docker history --no-trunc myimage # who added what
# Network debugging
docker exec api tcpdump -i any -nn -c 20
docker exec api ss -tnlp # listening ports
docker exec api ip route
# Process tracing
docker run --cap-add=SYS_PTRACE myimage strace -p 1
# Resource debugging
docker stats --no-stream
docker inspect --format '{{.HostConfig.Memory}}' container
# Live filesystem changes
docker diff <container> # what changed in the writable layerReal-world usage
- "My app exits immediately": check exit code → check logs → exec in with
--entrypoint shto verify the binary exists and is executable. - "It works locally but not in CI": compare env vars, mount paths, network.
docker inspectis the rosetta stone. - "Container is unhealthy but I do not know why":
docker inspect --format '{{json .State.Health}}'shows the last 5 healthcheck attempts with their output. - "Out of memory crashes":
docker inspectforOOMKilled: true. Sized? Bumped recently? Memory leak?docker statsover time. - "Can't reach another container": exec in, ping by service name, check
/etc/resolv.conf, verify both containers are on the same network.
Follow-up questions
Q: How do I get logs of a container that has been removed?
A: You cannot — docker rm deletes the container's log files. If you anticipate the need, configure a remote log driver (json-file log files survive container, but syslog/journald/fluentd ship logs offsite). Or always docker logs > backup.log before rm.
Q: What does exit code 125 mean?
A: Docker daemon error before the container started. Usually a Docker config issue (bad image, bad mount, port conflict). Look at the daemon log (journalctl -u docker).
Q: What does exit code 126 vs 127 mean?
A: 126 = command found but not executable (permissions or wrong arch). 127 = command not found at all. Often a typo or missing binary in the image.
Q: How do I debug a Compose service?
A: All the same commands work via docker compose: docker compose logs api, docker compose exec api sh, docker compose run --rm api sh for a one-off. The Compose wrappers know your project context.
Q: (Senior) How do you debug a container that crashes on the kubelet but works locally?
A: Reproduce the kubelet's exact run config: same image digest, same env, same entrypoint, same UID, same network policy. Pod spec → docker run flags is a known transformation. Often the difference is: K8s runs as a non-root UID by default, your local does not. Or K8s injects sidecars that block traffic. Or K8s mounts secrets/configmaps your local does not have. Use kubectl debug for an interactive container in the same pod context.
Examples
A complete debugging session
$ docker ps -a --filter name=api
STATUS NAMES
Exited (137) 5 seconds ago api
$ docker logs --tail 50 api
... lots of output ...
ERROR: connect ECONNREFUSED 172.18.0.5:5432
$ docker inspect api --format '{{.State.OOMKilled}} {{.State.ExitCode}} {{.HostConfig.Memory}}'
false 137 536870912
# OOM=false, exit 137 → not memory; SIGKILL came from somewhere
$ docker logs --tail 50 db
2026-04-30 ... database system is shut down
# db exited; api could not connect → SIGTERM cascaded.
# Fix: add depends_on with healthcheck in compose, ensure db survivesFour commands triangulated the issue: api was killed because db went down first.
Debugging an image that crashes immediately
$ docker run myimg
# (exits in 0.1 seconds)
$ docker run --rm myimg --version
# (also exits immediately, no output)
# Override entrypoint to investigate
$ docker run -it --rm --entrypoint sh myimg
/ # which myapp
/usr/local/bin/myapp
/ # ls -la /usr/local/bin/myapp
-rwxr-xr-x 1 root root 12M ...
/ # /usr/local/bin/myapp
Segmentation fault
# → wrong arch! Likely an x86 binary in an ARM image.
$ docker inspect --format '{{.Architecture}}' myimg
amd64
$ uname -m
aarch64
# Confirmed: image arch != host arch.Without --entrypoint sh, the binary's segfault was invisible; docker logs had nothing because the process died before stdio.
Healthcheck debugging
$ docker ps --format '{{.Names}}: {{.Status}}'
api: Up 5 minutes (unhealthy)
$ docker inspect api --format '{{range .State.Health.Log}}{{.End}}: exit={{.ExitCode}} out={{.Output}}\n{{end}}'
2026-04-30T10:00:00Z: exit=0 out=ok
2026-04-30T10:01:00Z: exit=0 out=ok
2026-04-30T10:02:00Z: exit=7 out=connect: connection refused
2026-04-30T10:03:00Z: exit=7 out=connect: connection refused
2026-04-30T10:04:00Z: exit=7 out=connect: connection refused
# → health endpoint stopped responding 3 minutes ago. App probably hung.
$ docker exec api ps aux
# Look for the main process; is it running but not responsive?
# If yes, app is stuck (deadlock, infinite loop). Capture a stack trace.The healthcheck log is gold — five attempts with exit codes and output.
Short Answer
Interview readyA concise answer to help you respond confidently on this topic during an interview.
Comments
No comments yet