How to fix 'no space left on device' on a Docker host?

docs.questions.sections.docker~4 min read

"No space left on device" is the most common Docker pain in production. Images accumulate, build cache piles up, container logs grow without bound, and volumes outlive the containers that created them. The fix is a mix of immediate cleanup and long-term hygiene.

Theory

TL;DR

Diagnose with docker system df before pruning. Know where space is going.
docker system prune -af --volumes reclaims everything not currently used. Safe in dev, careful in prod (drops detached volumes).
Logs are the silent killer: a chatty container can fill /var/lib/docker/containers/<id>/<id>-json.log to gigabytes.
Build cache can grow to tens of GB on busy CI hosts. Prune regularly.
Long-term: put /var/lib/docker on its own partition; enable log rotation in daemon.json; cron a prune.

Where space goes

Docker stores everything under /var/lib/docker (or wherever data-root points):

/var/lib/docker/
├── overlay2/         # image layers + container writable layers
├── containers/<id>/  # container metadata + logs (json-file)
├── volumes/          # named volumes
├── image/            # image manifest metadata
├── buildkit/         # build cache (with BuildKit)
└── tmp/              # transient

On a busy host, breakdown is typically:

30-50% — image layers
20-40% — anonymous/orphan volumes
10-20% — container logs
5-20% — build cache

docker system df shows this in human-readable form.

Categories of waste

Category	What it is	How to clean
Stopped containers	exited containers retained for `docker logs`/`docker start`	`docker container prune -f`
Dangling images	images with no tag (replaced by newer build)	`docker image prune -f`
Unused images	images not referenced by any container	`docker image prune -af` (note `-a`)
Anonymous volumes	volumes auto-created by `VOLUME` Dockerfile directive, never cleaned	`docker volume prune -f` (named volumes too if `-a`)
Build cache	BuildKit cache layers from past builds	`docker builder prune -af`
Container logs	json-file logs that grew unbounded	log rotation config
Networks	unused custom networks (small)	`docker network prune -f`

Examples

Diagnostic flow

bash

# Step 1: how big is /var/lib/docker?
sudo du -sh /var/lib/docker
# 87G

# Step 2: breakdown by Docker category
docker system df
# TYPE          TOTAL  ACTIVE  SIZE     RECLAIMABLE
# Images        67     12      31.4GB   24.1GB (76%)
# Containers    34     8       3.2GB    2.7GB
# Volumes       28     5       45.0GB   38.0GB (84%)
# Build Cache   2104           8.0GB    8.0GB

# Step 3: drill into the worst offender
docker system df -v       # verbose: per-image, per-container, per-volume

The RECLAIMABLE column is your first target.

Quick win: prune everything safely

bash

# Stopped containers, dangling images, unused networks, build cache
docker system prune -f
# Reclaimed: 12.5GB

No --volumes flag means volumes are kept. Safe default.

Aggressive: reclaim everything not in use

bash

docker system prune -af --volumes
# Includes:
#   - all images not used by a container (not just dangling)
#   - all volumes not mounted in any container

Dangerous in prod: a stopped container's volume is fine, but a volume that exists but happens not to be currently mounted (because the only container using it is being recreated) gets deleted. Use this in dev/CI, not prod.

Targeted commands

bash

# Images
docker image prune -f       # dangling only
docker image prune -af      # all images not used by a container

# Containers
docker container prune -f   # stopped containers

# Volumes
docker volume prune -f      # volumes not mounted in any container

# Build cache
docker builder prune -f               # cache older than 24h, dangling
docker builder prune -af              # all build cache
docker builder prune --filter until=168h   # cache older than 7 days

# Networks
docker network prune -f

Container logs

Logs default to json-file driver with no size limit. A chatty app fills GBs.

bash

# See per-container log file sizes
for c in $(docker ps -q); do
    name=$(docker inspect -f '{{.Name}}' $c | sed 's|/||')
    size=$(sudo du -sh /var/lib/docker/containers/$c/$c-json.log 2>/dev/null | cut -f1)
    echo "$size  $name"
done

Truncate a runaway log without restart:

bash

sudo truncate -s 0 /var/lib/docker/containers/<id>/<id>-json.log

Permanently fix by setting log limits in /etc/docker/daemon.json:

json

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "3"
  }
}

Then sudo systemctl restart docker. Existing containers keep old config until recreated; new containers inherit.

For production, use a log-shipper (Fluentd, Loki, syslog) so logs leave the host entirely.

Move `/var/lib/docker` to a bigger partition

If the OS partition is small (e.g., a DigitalOcean droplet with 25 GB):

bash

# Stop the daemon
sudo systemctl stop docker

# Mount a new disk at /mnt/docker
sudo rsync -a /var/lib/docker/ /mnt/docker/
sudo mv /var/lib/docker /var/lib/docker.bak
sudo ln -s /mnt/docker /var/lib/docker
# (or update daemon.json with "data-root": "/mnt/docker")

sudo systemctl start docker
docker info | grep 'Docker Root Dir'

Verify, then rm -rf /var/lib/docker.bak.

Periodic cleanup via cron

bash

# /etc/cron.daily/docker-prune
#!/bin/sh
docker container prune -f
docker image prune -f
docker builder prune -f --filter until=72h
# Don't include --volumes; volume cleanup needs manual review

Make executable: chmod +x /etc/cron.daily/docker-prune.

"No space left" during a build

During a docker build, the error often comes from BuildKit's intermediate layers, not the final image:

bash

docker builder prune -af
# Frees the cache. Try the build again.

Or from /tmp (used for temporary downloads):

bash

df -h /tmp
# If /tmp is small, set TMPDIR=/var/tmp before docker build.

When prune does not help

Sometimes docker system df reports lots of reclaimable space but docker system prune reclaims very little. Causes:

Inodes exhausted (not blocks). df -i to check.
Open file handles holding deleted files. Restart the daemon to release them.
Volume contents are huge but the volume itself is in use. The volume is "active" (used by a running container) so prune skips it. Inspect volume contents via docker run --rm -v <vol>:/data alpine du -sh /data.
Snapshots/COW chains. With devicemapper, you might need to recreate the storage pool. Migrate to overlay2.

Real-world usage

CI hosts: prune builder cache hourly; prune images daily; logs go to syslog.
Production app servers: log rotation in daemon.json; weekly prune cron; alerting on disk usage > 70%.
Single-host hobby: monthly docker system prune -af. Done.
Disk-emergency: docker system prune -af --volumes if you can confirm no detached but-needed volumes; otherwise prune images and builder first.

Common mistakes

Running prune --volumes in prod blindly

If a service is being recreated and its volume is briefly detached, prune deletes it. Always confirm volume usage:

bash

docker volume ls
# Manually inspect any unfamiliar volume before pruning

Forgetting -a on docker image prune

Without -a, only dangling images (no tag) are removed. Tagged but unused images stay.

Ignoring container logs

A single chatty service can fill 50 GB in <id>-json.log while you wonder why disk is full and docker system df shows nothing unusual.

Putting /var/lib/docker on the OS partition with no monitoring

When disk fills, the daemon may go unstable; restart fails because logs cannot flush. Separate partition + alerting prevents this.

Follow-up questions

Q: What is the difference between docker prune and docker rm?

A: docker rm <name> removes a specific container; docker container prune removes all stopped containers in one shot. Same idea for images and volumes.

Q: Will pruning kill running services?

A: No. Prune commands skip resources that are active (running containers, mounted volumes, used images). They only touch genuinely unused things.

Q: How do I see what is in a volume before deleting it?

A: docker run --rm -v <volname>:/data alpine ls -la /data. If important data, back it up before pruning.

Q: (Senior) How do you build a long-term retention policy for build cache?

A: BuildKit supports cache backends (--cache-to=type=registry,ref=...). Push cache to a dedicated registry image; locally prune anything older than N days; rely on remote cache for shared CI. This bounds local disk while preserving cross-build deduplication.

Q: (Senior) Why does my disk fill faster than docker system df says?

A: docker system df does not include daemon.json-level state, the BuildKit's own metadata, or external mounts. Compare with du -sh /var/lib/docker/*. Discrepancies usually mean: orphaned overlay2 dirs from a daemon crash, very large container log files, or a host bind mount to a directory you forgot about.

Short Answer

Interview ready

Premium

A concise answer to help you respond confidently on this topic during an interview.

Comments

No comments yet