How does Docker build cache work and how to manage it?

docs.questions.sections.docker~4 min read

Docker build cache is the difference between a 60-second rebuild and a 2-second one. Knowing how the cache key is computed and how to keep it valid is the single biggest skill for fast Dockerfiles.

Theory

TL;DR

After each instruction, Docker stores the resulting layer in a cache.
On rebuild, Docker computes a cache key for each instruction. Match → reuse the layer; mismatch → re-execute and invalidate everything below.
Cache key components:
- Previous layer's digest (the chain matters)
- The instruction text itself
- For COPY and ADD: the digest of every file being copied
- For RUN: just the command string. Docker does NOT inspect what the command does.
Order matters: put stable, expensive steps high; volatile, frequently-changing steps low.
BuildKit cache mounts (RUN --mount=type=cache,target=/path) persist a cache across builds without becoming part of any layer.
--no-cache rebuilds everything from scratch.

How cache invalidation works

FROM alpine:3.21              ← cached if alpine:3.21 unchanged
WORKDIR /app                  ← cached if FROM unchanged
COPY package.json ./          ← cached if package.json bytes unchanged
RUN npm ci                    ← cached if previous step cache hit
COPY src/ ./src/              ← invalidates if any file in src/ changed
CMD ["node", "server.js"]     ← cached if previous step cache hit

The key insight: Docker hashes file contents for COPY/ADD but not for RUN command outputs. RUN apt-get install curl cache-hits even if upstream apt has a new curl version.

Optimizing instruction order

dockerfile

# WRONG: source copied before deps installed
FROM node:22-alpine
WORKDIR /app
COPY . .                     # any file change invalidates everything below
RUN npm ci --omit=dev        # re-runs every code change
CMD ["node", "server.js"]

# RIGHT: deps first, source last
FROM node:22-alpine
WORKDIR /app
COPY package*.json ./        # changes only when deps change
RUN npm ci --omit=dev        # cached unless package*.json changed
COPY . .                     # changes when source changes; only this re-runs
CMD ["node", "server.js"]

For a typical app with stable deps, this turns rebuild time from 60 seconds (the wrong way) to 2 seconds (the right way).

BuildKit cache mounts

With BuildKit (default in modern Docker), you can mount a cache directory that persists across builds without being part of the image:

dockerfile

# syntax=docker/dockerfile:1.7
FROM python:3.13-slim
WORKDIR /app
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]

The pip wheel cache lives outside the layer. Build #2 with the same requirements.txt reuses the wheels even though the layer itself was rebuilt. Layer stays clean; wheels stay cached.

Common cache-mount targets:

pip: /root/.cache/pip
npm: /root/.npm
apt: /var/cache/apt and /var/lib/apt/lists with sharing=locked
Go modules: /go/pkg/mod
Cargo: /usr/local/cargo/registry

With BuildKit + docker buildx, you can export and import cache to a registry, so CI builds reuse cache across runners:

bash

# First build: write cache to registry
docker buildx build \
    --cache-to type=registry,ref=myreg/myapp:cache,mode=max \
    --cache-from type=registry,ref=myreg/myapp:cache \
    -t myreg/myapp:1.0 \
    --push .

# Subsequent builds (different runner) read from the same cache
docker buildx build \
    --cache-from type=registry,ref=myreg/myapp:cache \
    -t myreg/myapp:1.1 \
    --push .

A cold runner now starts as warm as the last successful build. Massive CI speedup for projects with heavy build steps.

Bypassing the cache

bash

# Rebuild everything from scratch
docker build --no-cache -t myapp .

# Refresh just the FROM (re-pull the base image)
docker build --pull -t myapp .

# Both
docker build --pull --no-cache -t myapp .

# Invalidate from a specific instruction onwards (BuildKit)
#  Use a build arg whose value changes: --build-arg BUILD_REV=$(date +%s)

Common mistakes

COPY . . before RUN install

Covered above. The single most common cache-killer.

Putting apt-get update in a separate RUN from apt-get install

dockerfile

# WRONG: update can cache hit while install pulls a stale package list
RUN apt-get update
RUN apt-get install -y --no-install-recommends curl

# RIGHT: keep them in one RUN so they always run together
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

If apt-get update is cached and apt-get install runs, you can install from a stale package list — packages may be missing.

Mounting source code that triggers cache invalidation on every save

dockerfile

COPY . .                     # invalidated by editor save in any file

For dev environments, use bind mounts at run time instead. For CI builds, accept that source changes invalidate later layers and design around it (deps first).

Forgetting that RUN does not look inside the command

dockerfile

RUN curl https://example.com/installer.sh | sh
# Same RUN string forever; never refreshes even if installer.sh changes.

Docker's cache key for RUN is the literal command. To force re-execution, change the string somehow:

dockerfile

ARG INSTALLER_SHA="abc123..."
RUN curl https://example.com/installer.sh -o /tmp/i.sh && \
    echo "$INSTALLER_SHA  /tmp/i.sh" | sha256sum -c && \
    sh /tmp/i.sh
# Now changing INSTALLER_SHA invalidates this layer.

Inspecting and managing cache

bash

# See cache usage
docker system df               # high-level
docker buildx du               # build cache details

# Prune build cache
docker builder prune            # interactive
docker builder prune -af        # all, unconditional
docker builder prune --filter 'until=72h'  # older than 3 days

# Show what BuildKit considered cached
DOCKER_BUILDKIT=1 docker build --progress=plain -t myapp .
# Output shows CACHED for hits, RUN for misses

Real-world usage

Local dev: dep-install layer cached → 2-second rebuilds for code changes. Productivity multiplier.
CI: --cache-from registry to bring last build's cache to a fresh runner. Cuts 10-minute builds to 90 seconds.
Cache mounts for package managers: pip/npm/apt caches persist across builds without bloating image.
Build farms (Bazel-style): the cache is shipped as a registry artifact; many builders share one cache.

Follow-up questions

Q: Why does my CI build never hit cache, even when nothing changed?

A: Each CI runner starts clean — no local cache. Use --cache-from to read cache from a registry that survives across runs.

Q: What is the difference between BuildKit cache mounts and image layers?

A: Layers are part of the image. Cache mounts are not — they live in a separate cache, attached at build time. Mounts are how you keep build-time caches (npm packages, pip wheels) without bloating your final image with files you only needed to compile.

Q: How do I invalidate just the latter half of a Dockerfile?

A: Add a ARG CACHEBUST=1 line at the right point and pass --build-arg CACHEBUST=$(date +%s). The next build will see a different value and invalidate from there down.

Q: Does --pull invalidate everything?

A: Only if the base image actually has a new digest. --pull re-checks FROM, but if node:22-alpine resolves to the same digest as last time, the FROM stays cached and so does everything after.

Q: (Senior) How would you set up cache-from in a GitHub Actions matrix build?

A: Use docker/build-push-action@v5 with cache-from: type=gha and cache-to: type=gha,mode=max. GitHub Actions provides a built-in cache backend per repo. For more aggressive cross-job sharing, use type=registry,ref=ghcr.io/myorg/myapp:cache. Avoid type=local in CI — runners are ephemeral.

Examples

Optimal Node Dockerfile

dockerfile

# syntax=docker/dockerfile:1.7
FROM node:22-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm \
    npm ci

FROM deps AS build
COPY . .
RUN npm run build

FROM node:22-alpine
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY --from=deps /app/node_modules ./node_modules
USER node
CMD ["node", "dist/server.js"]

Stage deps only invalidates when package*.json changes.
npm cache mount survives between builds.
Source changes only re-run the build stage.

CI-shared cache via registry

yaml

# .github/workflows/build.yml
- uses: docker/build-push-action@v5
  with:
    push: true
    tags: myorg/myapp:${{ github.sha }}
    cache-from: type=registry,ref=myorg/myapp:cache
    cache-to: type=registry,ref=myorg/myapp:cache,mode=max

First run populates myorg/myapp:cache. Every subsequent run on any runner reuses it. Build times drop dramatically.

Short Answer

Interview ready

Premium

A concise answer to help you respond confidently on this topic during an interview.

Comments

No comments yet