How to fix 'no space left on device' on a Docker host?
"No space left on device" is the most common Docker pain in production. Images accumulate, build cache piles up, container logs grow without bound, and volumes outlive the containers that created them. The fix is a mix of immediate cleanup and long-term hygiene.
Theory
TL;DR
- Diagnose with
docker system dfbefore pruning. Know where space is going. docker system prune -af --volumesreclaims everything not currently used. Safe in dev, careful in prod (drops detached volumes).- Logs are the silent killer: a chatty container can fill
/var/lib/docker/containers/<id>/<id>-json.logto gigabytes. - Build cache can grow to tens of GB on busy CI hosts. Prune regularly.
- Long-term: put
/var/lib/dockeron its own partition; enable log rotation indaemon.json; cron a prune.
Where space goes
Docker stores everything under /var/lib/docker (or wherever data-root points):
/var/lib/docker/
├── overlay2/ # image layers + container writable layers
├── containers/<id>/ # container metadata + logs (json-file)
├── volumes/ # named volumes
├── image/ # image manifest metadata
├── buildkit/ # build cache (with BuildKit)
└── tmp/ # transientOn a busy host, breakdown is typically:
- 30-50% — image layers
- 20-40% — anonymous/orphan volumes
- 10-20% — container logs
- 5-20% — build cache
docker system df shows this in human-readable form.
Categories of waste
| Category | What it is | How to clean |
|---|---|---|
| Stopped containers | exited containers retained for docker logs/docker start | docker container prune -f |
| Dangling images | images with no tag (replaced by newer build) | docker image prune -f |
| Unused images | images not referenced by any container | docker image prune -af (note -a) |
| Anonymous volumes | volumes auto-created by VOLUME Dockerfile directive, never cleaned | docker volume prune -f (named volumes too if -a) |
| Build cache | BuildKit cache layers from past builds | docker builder prune -af |
| Container logs | json-file logs that grew unbounded | log rotation config |
| Networks | unused custom networks (small) | docker network prune -f |
Examples
Diagnostic flow
# Step 1: how big is /var/lib/docker?
sudo du -sh /var/lib/docker
# 87G
# Step 2: breakdown by Docker category
docker system df
# TYPE TOTAL ACTIVE SIZE RECLAIMABLE
# Images 67 12 31.4GB 24.1GB (76%)
# Containers 34 8 3.2GB 2.7GB
# Volumes 28 5 45.0GB 38.0GB (84%)
# Build Cache 2104 8.0GB 8.0GB
# Step 3: drill into the worst offender
docker system df -v # verbose: per-image, per-container, per-volumeThe RECLAIMABLE column is your first target.
Quick win: prune everything safely
# Stopped containers, dangling images, unused networks, build cache
docker system prune -f
# Reclaimed: 12.5GBNo --volumes flag means volumes are kept. Safe default.
Aggressive: reclaim everything not in use
docker system prune -af --volumes
# Includes:
# - all images not used by a container (not just dangling)
# - all volumes not mounted in any containerDangerous in prod: a stopped container's volume is fine, but a volume that exists but happens not to be currently mounted (because the only container using it is being recreated) gets deleted. Use this in dev/CI, not prod.
Targeted commands
# Images
docker image prune -f # dangling only
docker image prune -af # all images not used by a container
# Containers
docker container prune -f # stopped containers
# Volumes
docker volume prune -f # volumes not mounted in any container
# Build cache
docker builder prune -f # cache older than 24h, dangling
docker builder prune -af # all build cache
docker builder prune --filter until=168h # cache older than 7 days
# Networks
docker network prune -fContainer logs
Logs default to json-file driver with no size limit. A chatty app fills GBs.
# See per-container log file sizes
for c in $(docker ps -q); do
name=$(docker inspect -f '{{.Name}}' $c | sed 's|/||')
size=$(sudo du -sh /var/lib/docker/containers/$c/$c-json.log 2>/dev/null | cut -f1)
echo "$size $name"
doneTruncate a runaway log without restart:
sudo truncate -s 0 /var/lib/docker/containers/<id>/<id>-json.logPermanently fix by setting log limits in /etc/docker/daemon.json:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
}
}Then sudo systemctl restart docker. Existing containers keep old config until recreated; new containers inherit.
For production, use a log-shipper (Fluentd, Loki, syslog) so logs leave the host entirely.
Move /var/lib/docker to a bigger partition
If the OS partition is small (e.g., a DigitalOcean droplet with 25 GB):
# Stop the daemon
sudo systemctl stop docker
# Mount a new disk at /mnt/docker
sudo rsync -a /var/lib/docker/ /mnt/docker/
sudo mv /var/lib/docker /var/lib/docker.bak
sudo ln -s /mnt/docker /var/lib/docker
# (or update daemon.json with "data-root": "/mnt/docker")
sudo systemctl start docker
docker info | grep 'Docker Root Dir'Verify, then rm -rf /var/lib/docker.bak.
Periodic cleanup via cron
# /etc/cron.daily/docker-prune
#!/bin/sh
docker container prune -f
docker image prune -f
docker builder prune -f --filter until=72h
# Don't include --volumes; volume cleanup needs manual reviewMake executable: chmod +x /etc/cron.daily/docker-prune.
"No space left" during a build
During a docker build, the error often comes from BuildKit's intermediate layers, not the final image:
docker builder prune -af
# Frees the cache. Try the build again.Or from /tmp (used for temporary downloads):
df -h /tmp
# If /tmp is small, set TMPDIR=/var/tmp before docker build.When prune does not help
Sometimes docker system df reports lots of reclaimable space but docker system prune reclaims very little. Causes:
- Inodes exhausted (not blocks).
df -ito check. - Open file handles holding deleted files. Restart the daemon to release them.
- Volume contents are huge but the volume itself is in use. The volume is "active" (used by a running container) so prune skips it. Inspect volume contents via
docker run --rm -v <vol>:/data alpine du -sh /data. - Snapshots/COW chains. With devicemapper, you might need to recreate the storage pool. Migrate to overlay2.
Real-world usage
- CI hosts: prune builder cache hourly; prune images daily; logs go to syslog.
- Production app servers: log rotation in daemon.json; weekly prune cron; alerting on disk usage > 70%.
- Single-host hobby: monthly
docker system prune -af. Done. - Disk-emergency:
docker system prune -af --volumesif you can confirm no detached but-needed volumes; otherwise prune images and builder first.
Common mistakes
Running prune --volumes in prod blindly
If a service is being recreated and its volume is briefly detached, prune deletes it. Always confirm volume usage:
docker volume ls
# Manually inspect any unfamiliar volume before pruningForgetting -a on docker image prune
Without -a, only dangling images (no tag) are removed. Tagged but unused images stay.
Ignoring container logs
A single chatty service can fill 50 GB in <id>-json.log while you wonder why disk is full and docker system df shows nothing unusual.
Putting /var/lib/docker on the OS partition with no monitoring
When disk fills, the daemon may go unstable; restart fails because logs cannot flush. Separate partition + alerting prevents this.
Follow-up questions
Q: What is the difference between docker prune and docker rm?
A: docker rm <name> removes a specific container; docker container prune removes all stopped containers in one shot. Same idea for images and volumes.
Q: Will pruning kill running services?
A: No. Prune commands skip resources that are active (running containers, mounted volumes, used images). They only touch genuinely unused things.
Q: How do I see what is in a volume before deleting it?
A: docker run --rm -v <volname>:/data alpine ls -la /data. If important data, back it up before pruning.
Q: (Senior) How do you build a long-term retention policy for build cache?
A: BuildKit supports cache backends (--cache-to=type=registry,ref=...). Push cache to a dedicated registry image; locally prune anything older than N days; rely on remote cache for shared CI. This bounds local disk while preserving cross-build deduplication.
Q: (Senior) Why does my disk fill faster than docker system df says?
A: docker system df does not include daemon.json-level state, the BuildKit's own metadata, or external mounts. Compare with du -sh /var/lib/docker/*. Discrepancies usually mean: orphaned overlay2 dirs from a daemon crash, very large container log files, or a host bind mount to a directory you forgot about.
Short Answer
Interview readyA concise answer to help you respond confidently on this topic during an interview.
Comments
No comments yet