Suggest an edit

Improve this article

Refine the answer for “What is a Dockerfile?”. Your changes go to moderation before they’re published.

Approval required

Content

What you’re changing

Title (EN)

Short answer (EN)

Shown above the full answer for quick recall.

Answer (EN)

**A Dockerfile** is a plain text file containing instructions that Docker reads top-to-bottom to assemble an image. Each non-trivial instruction creates a new layer; the layers stack to form the final image.

## Theory

### TL;DR

- Plain text. No JSON, no YAML. One instruction per line in uppercase: `FROM`, `RUN`, `COPY`, `CMD`, etc.
- Goes top-to-bottom. Earlier instructions land in lower layers; later ones add on top.
- Each instruction = one layer (cached). Same instruction with the same input on rebuild = cache hit, no work.
- **Order matters**: put stable, expensive things (system deps) early, frequently-changing things (your source code) late.
- Multi-stage builds let you build in one stage and copy only the artifact into a slim runtime stage. Smaller, safer images.

### Quick example

```dockerfile
# Dockerfile - typical Node.js app
FROM node:22-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
USER node
EXPOSE 3000
CMD ["node", "server.js"]
```

```bash
$ docker build -t myapp:1.0 .
[+] Building 12.4s (10/10) FINISHED
 => [1/6] FROM node:22-alpine
 => [2/6] WORKDIR /app
 => [3/6] COPY package*.json ./
 => [4/6] RUN npm ci --omit=dev
 => [5/6] COPY . .
 => [6/6] USER node
 => exporting layers
```

Seven instructions, six layers (the last `EXPOSE` and `CMD` are metadata only). Change a source file and rebuild: only steps 5 and after re-execute. Steps 1-4 are pulled from cache.

### Key instructions

| Instruction | What it does |
|---|---|
| `FROM image[:tag]` | Sets the base image. First non-comment line of every Dockerfile. |
| `WORKDIR /path` | Sets the working directory for following `RUN`, `COPY`, `CMD`. Creates the dir if missing. |
| `COPY src dest` | Copies files from build context into the image. |
| `ADD src dest` | Like `COPY` but also handles URLs and tar extraction. **Prefer `COPY`** unless you need those features. |
| `RUN cmd` | Runs a shell command at build time. Common: install packages, build artifacts. |
| `ENV KEY=value` | Sets an environment variable that persists in the image. |
| `EXPOSE 80` | Documentation only - says "this image listens on port 80". Does not actually publish anything. |
| `USER name\|uid` | Sets the user for following instructions and the running container. Default: root (avoid). |
| `CMD ["prog", "arg"]` | Default command when a container starts. Can be overridden by `docker run`. |
| `ENTRYPOINT ["prog"]` | The fixed first part of the command; `CMD` becomes its default args. |
| `ARG name` | Build-time variable, set with `--build-arg`. Not present at runtime (use `ENV` for that). |

### CMD vs ENTRYPOINT

Both define what runs when a container starts. The difference matters when users override.

```dockerfile
# Pattern A: CMD only
CMD ["echo", "hello"]
# docker run myimage              -> echo hello
# docker run myimage echo bye     -> echo bye   (CMD fully replaced)

# Pattern B: ENTRYPOINT + CMD
ENTRYPOINT ["echo"]
CMD ["hello"]
# docker run myimage              -> echo hello
# docker run myimage bye          -> echo bye   (CMD replaced, ENTRYPOINT stays)
```

Use `ENTRYPOINT` when the image is one tool (e.g., a CLI). Use `CMD` alone when the image is a service that takes no args.

### Build cache and instruction order

Docker caches each layer by its instruction + inputs. Reorder for cache efficiency:

```dockerfile
# WRONG: source copied before deps installed
FROM node:22-alpine
WORKDIR /app
COPY . .                # any code change invalidates everything below
RUN npm ci --omit=dev   # re-runs on every code change
CMD ["node", "server.js"]

# RIGHT: deps installed before source copied
FROM node:22-alpine
WORKDIR /app
COPY package*.json ./   # only changes when deps change
RUN npm ci --omit=dev   # cached unless package.json changed
COPY . .                # changes when source changes; only this and below re-run
CMD ["node", "server.js"]
```

The difference: rebuild after a one-line code change in `server.js` becomes 1 second instead of 60 seconds.

### Multi-stage builds

Build in a fat stage, copy artifacts into a slim stage. Result: smaller, more secure final images.

```dockerfile
# Stage 1: build
FROM node:22-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build      # produces /app/dist

# Stage 2: runtime
FROM nginx:1.27-alpine
COPY --from=build /app/dist /usr/share/nginx/html
EXPOSE 80
```

The final image is `nginx:1.27-alpine` plus your built static files. The Node toolchain, source code, `node_modules` - none of it lands in the runtime image. Smaller attack surface, smaller image, faster pull.

### Common mistakes

**Running as root in the final stage**

```dockerfile
# WRONG: default user is root
FROM node:22
COPY . /app
CMD ["node", "app.js"]

# RIGHT: drop privileges
FROM node:22
COPY --chown=node:node . /app
USER node
CMD ["node", "app.js"]
```

A root container that escapes its namespace is still root on the host. Always switch to a non-root user before `CMD`.

**Not using `.dockerignore`**

```
# .dockerignore
node_modules
.git
dist
*.log
.env*
Dockerfile
```

Without it, `COPY . .` ships your `node_modules` and `.git` to the daemon, slowing builds and bloating the image.

**Combining unrelated `RUN` commands incorrectly**

```dockerfile
# WRONG: each RUN is a layer; this creates three layers and leaves apt cache in image
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# RIGHT: one layer, cache cleaned in same step
RUN apt-get update && \
    apt-get install -y curl && \
    rm -rf /var/lib/apt/lists/*
```

If you delete files in a later layer, the earlier layer still contains them - the deletion just hides them. Clean up in the same `RUN` that created the mess.

**Using `ADD` when `COPY` is enough**

`ADD` extracts tarballs and fetches URLs. Both behaviors surprise people. Use `COPY` for plain file copies; reach for `ADD` only when you actually need its extra features.

### Real-world usage

- **CI/CD pipelines:** every PR triggers `docker build` against the repo's Dockerfile. Cache hit rates of 80-90 percent on well-ordered Dockerfiles keep builds fast.
- **Multi-arch builds:** `docker buildx build --platform linux/amd64,linux/arm64 -t myapp:1.0 .` produces a multi-platform image from one Dockerfile. Used when the same app deploys to x86 servers and ARM (Mac M-series, Graviton).
- **Distroless / scratch images:** `FROM gcr.io/distroless/base` or `FROM scratch` for the final stage of a multi-stage build. Final image contains only your binary - no shell, no package manager, no attack surface beyond the app itself.
- **BuildKit features:** `# syntax=docker/dockerfile:1.7` at the top unlocks features like `RUN --mount=type=cache,target=/root/.npm` for persistent npm cache across builds.

### Follow-up questions

**Q:** What is the difference between `RUN`, `CMD`, and `ENTRYPOINT`?

**A:** `RUN` runs at build time and bakes its result into a layer. `CMD` and `ENTRYPOINT` run at container start time and define the default process. Build vs run is the dividing line.

**Q:** Why do my builds keep redownloading dependencies?

**A:** Probably because you run `COPY . .` before installing deps. Any change to any file invalidates the cache for that line and everything after, including the install step. Move the dep install up - copy lock files first, install, then copy the rest.

**Q:** What is BuildKit and do I need it?

**A:** BuildKit is the modern build engine for Docker (default since Docker 23). It enables parallel stage builds, cache mounts, secret mounts, and the `# syntax=docker/dockerfile:1.x` directive that adds new instructions. You almost always already have it. Run `docker buildx version` to confirm.

**Q:** When should I use `ARG` vs `ENV`?

**A:** `ARG` for build-time-only values (e.g., `--build-arg VERSION=1.2.3` to tag the build). `ENV` for runtime values that should be visible inside the running container (e.g., `ENV NODE_ENV=production`). `ARG` values disappear after build; `ENV` values persist.

**Q:** (Senior) How do you handle secrets at build time without leaking them into a layer?

**A:** Use BuildKit secret mounts: `RUN --mount=type=secret,id=npmrc cp /run/secrets/npmrc ~/.npmrc && npm ci`. The secret is available to the `RUN` step but never written to a layer. Pass it with `docker buildx build --secret id=npmrc,src=$HOME/.npmrc .`. Build args (`ARG`) leak into image history and should never carry secrets.

## Examples

### Multi-stage build for a Go service

```dockerfile
# Stage 1: build
FROM golang:1.23-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /out/server ./cmd/server

# Stage 2: runtime - nothing but the binary
FROM scratch
COPY --from=build /out/server /server
EXPOSE 8080
USER 65532:65532
ENTRYPOINT ["/server"]
```

Final image is roughly the size of the Go binary. No shell, no libc, no package manager. The only thing an attacker can interact with is your service.

### Python app with cache mount (BuildKit)

```dockerfile
# syntax=docker/dockerfile:1.7
FROM python:3.13-slim
WORKDIR /app
COPY requirements.txt ./
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]
```

The cache mount keeps pip's wheels cached **across builds** without baking them into the image. Build #2 with the same `requirements.txt` reuses the cache; the layer itself stays clean.

Markdown · drag & drop images · ⌘B / ⌘I shortcuts1481 words

For the reviewer

Note to the moderator (optional)

Visible only to the moderator. Helps review go faster.