What is a Dockerfile?

docs.questions.sections.docker~5 min read

A Dockerfile is a plain text file containing instructions that Docker reads top-to-bottom to assemble an image. Each non-trivial instruction creates a new layer; the layers stack to form the final image.

Theory

TL;DR

Plain text. No JSON, no YAML. One instruction per line in uppercase: FROM, RUN, COPY, CMD, etc.
Goes top-to-bottom. Earlier instructions land in lower layers; later ones add on top.
Each instruction = one layer (cached). Same instruction with the same input on rebuild = cache hit, no work.
Order matters: put stable, expensive things (system deps) early, frequently-changing things (your source code) late.
Multi-stage builds let you build in one stage and copy only the artifact into a slim runtime stage. Smaller, safer images.

Quick example

dockerfile

# Dockerfile - typical Node.js app
FROM node:22-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
USER node
EXPOSE 3000
CMD ["node", "server.js"]

bash

$ docker build -t myapp:1.0 .
[+] Building 12.4s (10/10) FINISHED
 => [1/6] FROM node:22-alpine
 => [2/6] WORKDIR /app
 => [3/6] COPY package*.json ./
 => [4/6] RUN npm ci --omit=dev
 => [5/6] COPY . .
 => [6/6] USER node
 => exporting layers

Seven instructions, six layers (the last EXPOSE and CMD are metadata only). Change a source file and rebuild: only steps 5 and after re-execute. Steps 1-4 are pulled from cache.

Key instructions

Instruction	What it does
`FROM image[:tag]`	Sets the base image. First non-comment line of every Dockerfile.
`WORKDIR /path`	Sets the working directory for following `RUN`, `COPY`, `CMD`. Creates the dir if missing.
`COPY src dest`	Copies files from build context into the image.
`ADD src dest`	Like `COPY` but also handles URLs and tar extraction. Prefer `COPY` unless you need those features.
`RUN cmd`	Runs a shell command at build time. Common: install packages, build artifacts.
`ENV KEY=value`	Sets an environment variable that persists in the image.
`EXPOSE 80`	Documentation only - says "this image listens on port 80". Does not actually publish anything.
`USER name\|uid`	Sets the user for following instructions and the running container. Default: root (avoid).
`CMD ["prog", "arg"]`	Default command when a container starts. Can be overridden by `docker run`.
`ENTRYPOINT ["prog"]`	The fixed first part of the command; `CMD` becomes its default args.
`ARG name`	Build-time variable, set with `--build-arg`. Not present at runtime (use `ENV` for that).

CMD vs ENTRYPOINT

Both define what runs when a container starts. The difference matters when users override.

dockerfile

# Pattern A: CMD only
CMD ["echo", "hello"]
# docker run myimage              -> echo hello
# docker run myimage echo bye     -> echo bye   (CMD fully replaced)

# Pattern B: ENTRYPOINT + CMD
ENTRYPOINT ["echo"]
CMD ["hello"]
# docker run myimage              -> echo hello
# docker run myimage bye          -> echo bye   (CMD replaced, ENTRYPOINT stays)

Use ENTRYPOINT when the image is one tool (e.g., a CLI). Use CMD alone when the image is a service that takes no args.

Build cache and instruction order

Docker caches each layer by its instruction + inputs. Reorder for cache efficiency:

dockerfile

# WRONG: source copied before deps installed
FROM node:22-alpine
WORKDIR /app
COPY . .                # any code change invalidates everything below
RUN npm ci --omit=dev   # re-runs on every code change
CMD ["node", "server.js"]

# RIGHT: deps installed before source copied
FROM node:22-alpine
WORKDIR /app
COPY package*.json ./   # only changes when deps change
RUN npm ci --omit=dev   # cached unless package.json changed
COPY . .                # changes when source changes; only this and below re-run
CMD ["node", "server.js"]

The difference: rebuild after a one-line code change in server.js becomes 1 second instead of 60 seconds.

Multi-stage builds

Build in a fat stage, copy artifacts into a slim stage. Result: smaller, more secure final images.

dockerfile

# Stage 1: build
FROM node:22-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build      # produces /app/dist

# Stage 2: runtime
FROM nginx:1.27-alpine
COPY --from=build /app/dist /usr/share/nginx/html
EXPOSE 80

The final image is nginx:1.27-alpine plus your built static files. The Node toolchain, source code, node_modules - none of it lands in the runtime image. Smaller attack surface, smaller image, faster pull.

Common mistakes

Running as root in the final stage

dockerfile

# WRONG: default user is root
FROM node:22
COPY . /app
CMD ["node", "app.js"]

# RIGHT: drop privileges
FROM node:22
COPY --chown=node:node . /app
USER node
CMD ["node", "app.js"]

A root container that escapes its namespace is still root on the host. Always switch to a non-root user before CMD.

Not using .dockerignore

# .dockerignore
node_modules
.git
dist
*.log
.env*
Dockerfile

Without it, COPY . . ships your node_modules and .git to the daemon, slowing builds and bloating the image.

Combining unrelated RUN commands incorrectly

dockerfile

# WRONG: each RUN is a layer; this creates three layers and leaves apt cache in image
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# RIGHT: one layer, cache cleaned in same step
RUN apt-get update && \
    apt-get install -y curl && \
    rm -rf /var/lib/apt/lists/*

If you delete files in a later layer, the earlier layer still contains them - the deletion just hides them. Clean up in the same RUN that created the mess.

Using ADD when COPY is enough

ADD extracts tarballs and fetches URLs. Both behaviors surprise people. Use COPY for plain file copies; reach for ADD only when you actually need its extra features.

Real-world usage

CI/CD pipelines: every PR triggers docker build against the repo's Dockerfile. Cache hit rates of 80-90 percent on well-ordered Dockerfiles keep builds fast.
Multi-arch builds: docker buildx build --platform linux/amd64,linux/arm64 -t myapp:1.0 . produces a multi-platform image from one Dockerfile. Used when the same app deploys to x86 servers and ARM (Mac M-series, Graviton).
Distroless / scratch images: FROM gcr.io/distroless/base or FROM scratch for the final stage of a multi-stage build. Final image contains only your binary - no shell, no package manager, no attack surface beyond the app itself.
BuildKit features: # syntax=docker/dockerfile:1.7 at the top unlocks features like RUN --mount=type=cache,target=/root/.npm for persistent npm cache across builds.

Follow-up questions

Q: What is the difference between RUN, CMD, and ENTRYPOINT?

A: RUN runs at build time and bakes its result into a layer. CMD and ENTRYPOINT run at container start time and define the default process. Build vs run is the dividing line.

Q: Why do my builds keep redownloading dependencies?

A: Probably because you run COPY . . before installing deps. Any change to any file invalidates the cache for that line and everything after, including the install step. Move the dep install up - copy lock files first, install, then copy the rest.

Q: What is BuildKit and do I need it?

A: BuildKit is the modern build engine for Docker (default since Docker 23). It enables parallel stage builds, cache mounts, secret mounts, and the # syntax=docker/dockerfile:1.x directive that adds new instructions. You almost always already have it. Run docker buildx version to confirm.

Q: When should I use ARG vs ENV?

A: ARG for build-time-only values (e.g., --build-arg VERSION=1.2.3 to tag the build). ENV for runtime values that should be visible inside the running container (e.g., ENV NODE_ENV=production). ARG values disappear after build; ENV values persist.

Q: (Senior) How do you handle secrets at build time without leaking them into a layer?

A: Use BuildKit secret mounts: RUN --mount=type=secret,id=npmrc cp /run/secrets/npmrc ~/.npmrc && npm ci. The secret is available to the RUN step but never written to a layer. Pass it with docker buildx build --secret id=npmrc,src=$HOME/.npmrc .. Build args (ARG) leak into image history and should never carry secrets.

Examples

Multi-stage build for a Go service

dockerfile

# Stage 1: build
FROM golang:1.23-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /out/server ./cmd/server

# Stage 2: runtime - nothing but the binary
FROM scratch
COPY --from=build /out/server /server
EXPOSE 8080
USER 65532:65532
ENTRYPOINT ["/server"]

Final image is roughly the size of the Go binary. No shell, no libc, no package manager. The only thing an attacker can interact with is your service.

Python app with cache mount (BuildKit)

dockerfile

# syntax=docker/dockerfile:1.7
FROM python:3.13-slim
WORKDIR /app
COPY requirements.txt ./
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]

The cache mount keeps pip's wheels cached across builds without baking them into the image. Build #2 with the same requirements.txt reuses the cache; the layer itself stays clean.

Short Answer

Interview ready

Premium

A concise answer to help you respond confidently on this topic during an interview.

Comments

No comments yet