Skip to main content

What is a Dockerfile?

A Dockerfile is a plain text file containing instructions that Docker reads top-to-bottom to assemble an image. Each non-trivial instruction creates a new layer; the layers stack to form the final image.

Theory

TL;DR

  • Plain text. No JSON, no YAML. One instruction per line in uppercase: FROM, RUN, COPY, CMD, etc.
  • Goes top-to-bottom. Earlier instructions land in lower layers; later ones add on top.
  • Each instruction = one layer (cached). Same instruction with the same input on rebuild = cache hit, no work.
  • Order matters: put stable, expensive things (system deps) early, frequently-changing things (your source code) late.
  • Multi-stage builds let you build in one stage and copy only the artifact into a slim runtime stage. Smaller, safer images.

Quick example

dockerfile
# Dockerfile - typical Node.js app FROM node:22-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --omit=dev COPY . . USER node EXPOSE 3000 CMD ["node", "server.js"]
bash
$ docker build -t myapp:1.0 . [+] Building 12.4s (10/10) FINISHED => [1/6] FROM node:22-alpine => [2/6] WORKDIR /app => [3/6] COPY package*.json ./ => [4/6] RUN npm ci --omit=dev => [5/6] COPY . . => [6/6] USER node => exporting layers

Seven instructions, six layers (the last EXPOSE and CMD are metadata only). Change a source file and rebuild: only steps 5 and after re-execute. Steps 1-4 are pulled from cache.

Key instructions

InstructionWhat it does
FROM image[:tag]Sets the base image. First non-comment line of every Dockerfile.
WORKDIR /pathSets the working directory for following RUN, COPY, CMD. Creates the dir if missing.
COPY src destCopies files from build context into the image.
ADD src destLike COPY but also handles URLs and tar extraction. Prefer COPY unless you need those features.
RUN cmdRuns a shell command at build time. Common: install packages, build artifacts.
ENV KEY=valueSets an environment variable that persists in the image.
EXPOSE 80Documentation only - says "this image listens on port 80". Does not actually publish anything.
USER name|uidSets the user for following instructions and the running container. Default: root (avoid).
CMD ["prog", "arg"]Default command when a container starts. Can be overridden by docker run.
ENTRYPOINT ["prog"]The fixed first part of the command; CMD becomes its default args.
ARG nameBuild-time variable, set with --build-arg. Not present at runtime (use ENV for that).

CMD vs ENTRYPOINT

Both define what runs when a container starts. The difference matters when users override.

dockerfile
# Pattern A: CMD only CMD ["echo", "hello"] # docker run myimage -> echo hello # docker run myimage echo bye -> echo bye (CMD fully replaced) # Pattern B: ENTRYPOINT + CMD ENTRYPOINT ["echo"] CMD ["hello"] # docker run myimage -> echo hello # docker run myimage bye -> echo bye (CMD replaced, ENTRYPOINT stays)

Use ENTRYPOINT when the image is one tool (e.g., a CLI). Use CMD alone when the image is a service that takes no args.

Build cache and instruction order

Docker caches each layer by its instruction + inputs. Reorder for cache efficiency:

dockerfile
# WRONG: source copied before deps installed FROM node:22-alpine WORKDIR /app COPY . . # any code change invalidates everything below RUN npm ci --omit=dev # re-runs on every code change CMD ["node", "server.js"] # RIGHT: deps installed before source copied FROM node:22-alpine WORKDIR /app COPY package*.json ./ # only changes when deps change RUN npm ci --omit=dev # cached unless package.json changed COPY . . # changes when source changes; only this and below re-run CMD ["node", "server.js"]

The difference: rebuild after a one-line code change in server.js becomes 1 second instead of 60 seconds.

Multi-stage builds

Build in a fat stage, copy artifacts into a slim stage. Result: smaller, more secure final images.

dockerfile
# Stage 1: build FROM node:22-alpine AS build WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build # produces /app/dist # Stage 2: runtime FROM nginx:1.27-alpine COPY --from=build /app/dist /usr/share/nginx/html EXPOSE 80

The final image is nginx:1.27-alpine plus your built static files. The Node toolchain, source code, node_modules - none of it lands in the runtime image. Smaller attack surface, smaller image, faster pull.

Common mistakes

Running as root in the final stage

dockerfile
# WRONG: default user is root FROM node:22 COPY . /app CMD ["node", "app.js"] # RIGHT: drop privileges FROM node:22 COPY --chown=node:node . /app USER node CMD ["node", "app.js"]

A root container that escapes its namespace is still root on the host. Always switch to a non-root user before CMD.

Not using .dockerignore

# .dockerignore node_modules .git dist *.log .env* Dockerfile

Without it, COPY . . ships your node_modules and .git to the daemon, slowing builds and bloating the image.

Combining unrelated RUN commands incorrectly

dockerfile
# WRONG: each RUN is a layer; this creates three layers and leaves apt cache in image RUN apt-get update RUN apt-get install -y curl RUN rm -rf /var/lib/apt/lists/* # RIGHT: one layer, cache cleaned in same step RUN apt-get update && \ apt-get install -y curl && \ rm -rf /var/lib/apt/lists/*

If you delete files in a later layer, the earlier layer still contains them - the deletion just hides them. Clean up in the same RUN that created the mess.

Using ADD when COPY is enough

ADD extracts tarballs and fetches URLs. Both behaviors surprise people. Use COPY for plain file copies; reach for ADD only when you actually need its extra features.

Real-world usage

  • CI/CD pipelines: every PR triggers docker build against the repo's Dockerfile. Cache hit rates of 80-90 percent on well-ordered Dockerfiles keep builds fast.
  • Multi-arch builds: docker buildx build --platform linux/amd64,linux/arm64 -t myapp:1.0 . produces a multi-platform image from one Dockerfile. Used when the same app deploys to x86 servers and ARM (Mac M-series, Graviton).
  • Distroless / scratch images: FROM gcr.io/distroless/base or FROM scratch for the final stage of a multi-stage build. Final image contains only your binary - no shell, no package manager, no attack surface beyond the app itself.
  • BuildKit features: # syntax=docker/dockerfile:1.7 at the top unlocks features like RUN --mount=type=cache,target=/root/.npm for persistent npm cache across builds.

Follow-up questions

Q: What is the difference between RUN, CMD, and ENTRYPOINT?


A: RUN runs at build time and bakes its result into a layer. CMD and ENTRYPOINT run at container start time and define the default process. Build vs run is the dividing line.

Q: Why do my builds keep redownloading dependencies?


A: Probably because you run COPY . . before installing deps. Any change to any file invalidates the cache for that line and everything after, including the install step. Move the dep install up - copy lock files first, install, then copy the rest.

Q: What is BuildKit and do I need it?


A: BuildKit is the modern build engine for Docker (default since Docker 23). It enables parallel stage builds, cache mounts, secret mounts, and the # syntax=docker/dockerfile:1.x directive that adds new instructions. You almost always already have it. Run docker buildx version to confirm.

Q: When should I use ARG vs ENV?


A: ARG for build-time-only values (e.g., --build-arg VERSION=1.2.3 to tag the build). ENV for runtime values that should be visible inside the running container (e.g., ENV NODE_ENV=production). ARG values disappear after build; ENV values persist.

Q: (Senior) How do you handle secrets at build time without leaking them into a layer?


A: Use BuildKit secret mounts: RUN --mount=type=secret,id=npmrc cp /run/secrets/npmrc ~/.npmrc && npm ci. The secret is available to the RUN step but never written to a layer. Pass it with docker buildx build --secret id=npmrc,src=$HOME/.npmrc .. Build args (ARG) leak into image history and should never carry secrets.

Examples

Multi-stage build for a Go service

dockerfile
# Stage 1: build FROM golang:1.23-alpine AS build WORKDIR /src COPY go.mod go.sum ./ RUN go mod download COPY . . RUN CGO_ENABLED=0 go build -o /out/server ./cmd/server # Stage 2: runtime - nothing but the binary FROM scratch COPY --from=build /out/server /server EXPOSE 8080 USER 65532:65532 ENTRYPOINT ["/server"]

Final image is roughly the size of the Go binary. No shell, no libc, no package manager. The only thing an attacker can interact with is your service.

Python app with cache mount (BuildKit)

dockerfile
# syntax=docker/dockerfile:1.7 FROM python:3.13-slim WORKDIR /app COPY requirements.txt ./ RUN --mount=type=cache,target=/root/.cache/pip \ pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "app.py"]

The cache mount keeps pip's wheels cached across builds without baking them into the image. Build #2 with the same requirements.txt reuses the cache; the layer itself stays clean.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Comments

No comments yet