Skip to main content

How to reduce the size of a Docker image?

Reducing Docker image size is part hygiene, part architecture. The right techniques applied to the right problem can shrink an image from 1 GB to 30 MB without losing functionality. Smaller images mean faster pulls, faster deploys, smaller attack surface.

Theory

TL;DR

Five techniques, in approximate order of impact:

  1. Multi-stage build with a slim final base (alpine, distroless, scratch). The single biggest win.
  2. Smaller base image: Debian slim → Alpine → distroless → scratch. Each step ~50-100 MB smaller.
  3. Single RUN for install + cleanup so cache files do not get baked into a layer.
  4. .dockerignore to keep build context small (no node_modules, .git, etc.).
  5. Strip dev dependencies, recommended packages, and unused locales in the runtime stage.

Measure with docker images and docker history. For deep analysis, use dive.

Quick example: before and after

Before (single stage, naive):

dockerfile
FROM node:22 WORKDIR /app COPY . . RUN npm install RUN npm run build CMD ["npm", "start"]

Final: ~1.2 GB.

After (multi-stage, alpine, prune):

dockerfile
FROM node:22-alpine AS build WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build RUN npm prune --omit=dev FROM node:22-alpine WORKDIR /app COPY --from=build /app/dist /app/dist COPY --from=build /app/node_modules /app/node_modules USER node CMD ["node", "dist/server.js"]

Final: ~180 MB. Same functionality.

For static sites (no Node runtime needed):

dockerfile
# Stage 2: FROM nginx:1.27-alpine COPY --from=build /app/dist /usr/share/nginx/html

Final: ~30 MB.

Technique 1: multi-stage with a slim final base

See the dedicated multi-stage article. Bottom line: the toolchain is the heaviest thing in your image; multi-stage is how you leave it behind.

Base image options for the final stage, in order of size:

BaseApprox sizeHas shell?Has package manager?
debian:bookworm120 MBYes (bash)apt
debian:bookworm-slim75 MBYes (bash)apt
ubuntu:24.0480 MBYes (bash)apt
alpine:3.217-8 MBYes (sh, busybox)apk
gcr.io/distroless/base20 MBNoNo
gcr.io/distroless/static2 MBNoNo
scratch0NoNo

Pick the smallest that has what your binary actually needs.

Technique 2: combine RUN commands and clean cache

dockerfile
# WRONG: each RUN is a layer; apt cache survives in layer 2 RUN apt-get update RUN apt-get install -y curl RUN rm -rf /var/lib/apt/lists/* # RIGHT: one layer, cache deleted in same step RUN apt-get update && \ apt-get install -y --no-install-recommends curl && \ rm -rf /var/lib/apt/lists/*

The wrong version saves no space — layer 2 holds the apt cache, layer 3 only adds whiteout markers (the cache files are still on disk).

Apply the same pattern to:

  • apk (Alpine): apk add --no-cache <pkg> (auto-cleans)
  • pip: pip install --no-cache-dir <pkg>
  • npm: npm ci --only=production && npm cache clean --force

Technique 3: .dockerignore

Anything in your build context gets sent to the daemon, slowing builds and bloating layers. A typical .dockerignore:

.git node_modules dist *.log .env* Dockerfile* README.md coverage .vscode .idea

Without this, a COPY . . ships gigabytes you do not need.

dockerfile
# Node RUN npm ci --omit=dev # Python RUN pip install --no-cache-dir --prefix=/install <pkgs> # Then in final stage, COPY only /install # Go: nothing to do (binary is self-contained) # apt with --no-install-recommends RUN apt-get install --no-install-recommends -y curl

Dev deps (TypeScript compiler, jest, eslint) often double node_modules. --no-install-recommends cuts apt's optional packages.

Technique 5: minimize what gets COPIED

dockerfile
# Granular copies are smaller AND better for caching COPY package*.json ./ # only lockfiles → install RUN npm ci COPY src/ ./src/ # only what runtime needs COPY public/ ./public/

Vs. COPY . . which copies tests, docs, IDE config, build outputs.

Inspecting and finding the bloat

bash
# Per-layer sizes $ docker history --no-trunc myimage IMAGE CREATED CREATED BY SIZE 4f06b3e2c0c1 2 minutes ago /bin/sh -c #(nop) CMD ["node" "server.js"] 0B <missing> 2 minutes ago /bin/sh -c npm prune --omit=dev 156MB ← attack this <missing> 3 minutes ago /bin/sh -c npm run build 34MB <missing> 4 minutes ago /bin/sh -c npm ci 312MB ← biggest culprit ... # Interactive layer-by-layer view $ dive myimage # Shows each layer's added/removed/total bytes, file tree per layer.

dive is the gold standard for understanding why an image is what it is.

Common mistakes

Adding files in one layer, deleting in another

dockerfile
# WRONG: 200 MB still in layer N, layer N+1 just hides it ADD bigfile.tar.gz /tmp/ RUN unpack-and-process /tmp/bigfile.tar.gz RUN rm -rf /tmp/* # whiteout, but the data is in layer N forever # RIGHT: do it all in one layer RUN mkdir -p /tmp/x && \ curl -L https://... | tar xz -C /tmp/x && \ process /tmp/x && \ rm -rf /tmp/x

Layers are immutable. Once a file lands in a layer, no later layer can shrink the image — only the original layer can avoid having the file.

Using apt without --no-install-recommends

Debian's apt installs "recommended" packages by default. For a server image, almost none are needed. Always:

dockerfile
RUN apt-get update && \ apt-get install --no-install-recommends -y curl && \ rm -rf /var/lib/apt/lists/*

Picking Debian when Alpine works

Most language runtimes have an Alpine variant: node:22-alpine, python:3.13-alpine, golang:1.23-alpine. They are usually 70-80% smaller. Caveat: Alpine uses musl libc, not glibc — some prebuilt binaries (NumPy with Intel MKL, some Node native modules) do not work on Alpine. When that bites, use *-slim Debian variants.

Forgetting to pin latest and getting bigger images by accident

node:latest might be 1 GB; node:22-alpine is 200 MB. Picking the right tag is half the battle.

Real-world usage

  • Static site distribution: nginx:alpine final stage → 25-30 MB. Industry standard.
  • Go services: FROM scratch + binary → 5-15 MB. Serverless-fast cold starts.
  • Python ML services: python:3.13-slim + only required packages, with --no-cache-dir everywhere → 200-500 MB instead of 2 GB.
  • CI build images: the one place where size matters less; they live on the runner. But still, a 5 GB CI image slows every job.

Follow-up questions

Q: Does compressing my files reduce image size?


A: Not really — Docker layers are already gzipped on push/pull. Your work is at the file level, not compression.

Q: Why is my image so much bigger than the sum of files inside?


A: Because of how layers work — files added then deleted still take space. Use dive or docker history to find the bloat.

Q: Should I use Alpine for everything?


A: Most things, yes. Exceptions: Python ML/data-science (NumPy, SciPy, pandas have prebuilt wheels for glibc; Alpine forces musl-compatible builds, slow), heavy native dependencies. For these, *-slim Debian is a better default.

Q: What is the difference between distroless and Alpine?


A: Alpine has busybox, sh, apk — small but not minimal. Distroless has only the runtime your language needs (Node, Python, JVM, or none for static). No shell, no package manager, no anything. Smaller and more secure than Alpine; harder to debug (no docker exec sh).

Q: (Senior) When does aggressive size reduction become counterproductive?


A: When debugging in production becomes impossible (no shell, no tools). Use a separate :debug variant for that. When build complexity skyrockets (10-stage Dockerfiles with custom apk repositories) for marginal gains. When the squeezed image breaks at runtime because some lib was missing. Find the sweet spot: small enough to pull fast and minimize attack surface, big enough to debug when needed.

Examples

Static site: 1.2 GB → 28 MB

dockerfile
# BEFORE (1.2 GB) FROM node:22 WORKDIR /app COPY . . RUN npm install RUN npm run build CMD ["npx", "http-server", "dist"] # AFTER (28 MB) FROM node:22-alpine AS build WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build FROM nginx:1.27-alpine COPY --from=build /app/dist /usr/share/nginx/html

Python ML service: 2.5 GB → 480 MB

dockerfile
# AFTER FROM python:3.13-slim AS build WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir --prefix=/install -r requirements.txt FROM python:3.13-slim WORKDIR /app COPY --from=build /install /usr/local COPY app.py . USER 1000:1000 CMD ["python", "app.py"]

Key moves: slim base, --no-cache-dir, isolated install via prefix and copy.

Go service: 700 MB → 12 MB

dockerfile
FROM golang:1.23-alpine AS build WORKDIR /src COPY go.mod go.sum ./ RUN go mod download COPY . . RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o /out/server ./cmd/server FROM scratch COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ COPY --from=build /out/server /server USER 65532:65532 ENTRYPOINT ["/server"]

-ldflags="-s -w" strips Go binary debug symbols. FROM scratch adds nothing. The binary plus a TLS cert bundle is the entire image.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Comments

No comments yet