What are Linux namespaces and cgroups in Docker?

docs.questions.sections.docker~4 min read

Linux namespaces and cgroups are the two kernel mechanisms that make containers possible. Without them, you have processes; with them, you have containers. Docker is mostly a tool for configuring these features at scale.

Theory

TL;DR

Namespaces answer "what can this process see?". Seven types: PID, mount, network, IPC, UTS, user, cgroup.
cgroups answer "what can this process use?". Limits CPU, memory, I/O, PIDs.
Both are kernel features that pre-date Docker by years (LXC popularized them; Docker made them mainstream).
A container is conceptually: unshare() to create namespaces + cgroups config + chroot (or pivot_root) into a rootfs + exec() your binary.
Modern Docker on Linux uses cgroups v2 (unified hierarchy). Legacy v1 had separate hierarchies per controller.

Namespaces — the seven types

Namespace	What it isolates
PID	process IDs — container has its own PID 1
mount (mnt)	mount points — container has its own filesystem view
network (net)	network interfaces, routing tables, sockets, ports
IPC	shared memory, semaphores, message queues
UTS	hostname and domain name
user	UIDs/GIDs (with remapping)
cgroup	cgroup root view (cgroups v2)

Each namespace is a kernel resource you can attach a process to via unshare(2) or clone(2). Docker creates them when starting a container.

Verifying namespaces in a running container

bash

# A container's namespaces (each has a unique inode)
$ docker run -it --rm alpine sh
/ # ls -la /proc/self/ns
lrwxrwxrwx ... cgroup -> 'cgroup:[4026532840]'
lrwxrwxrwx ... ipc    -> 'ipc:[4026532838]'
lrwxrwxrwx ... mnt    -> 'mnt:[4026532836]'
lrwxrwxrwx ... net    -> 'net:[4026532842]'
lrwxrwxrwx ... pid    -> 'pid:[4026532839]'
lrwxrwxrwx ... user   -> 'user:[4026531837]'
lrwxrwxrwx ... uts    -> 'uts:[4026532837]'

# Compare with the host (different inodes for everything except possibly user)
$ ls -la /proc/self/ns

Different inodes = different namespaces = isolated views.

cgroups — what they limit

In cgroups v2 (unified hierarchy, the modern norm), the controllers include:

cpu          — CPU time (cpus, cpu.weight)
memory       — RAM and swap usage
io           — block I/O bandwidth and IOPS
pids         — number of processes
rdma         — RDMA bandwidth
hugetlb      — huge pages

bash

# Inside a container with --memory=256m
$ docker run --rm --memory=256m alpine cat /sys/fs/cgroup/memory.max
268435456    # 256 * 1024 * 1024 bytes

$ docker run --rm --cpus=0.5 alpine cat /sys/fs/cgroup/cpu.max
50000 100000  # 50ms quota per 100ms period (= 0.5 CPU)

Docker translates --memory, --cpus, etc. into cgroup files in /sys/fs/cgroup/....

How docker run uses both

docker run --memory=256m --cpus=1 myapp
     │
     ├── containerd → runc
     │       │
     │       ├── unshare(CLONE_NEW{PID,NS,NET,IPC,UTS,USER,CGROUP})
     │       │       → 7 fresh namespaces
     │       │
     │       ├── write cgroup files
     │       │       → memory.max=256MB, cpu.max=100000 100000
     │       │
     │       ├── pivot_root into image rootfs
     │       │
     │       └── exec(your-binary)
     │
     └── you see a 'container'

Namespaces give it private views; cgroups limit what it can do; pivot_root + image layers give it a custom filesystem; exec runs your binary as PID 1 in this little world.

User namespace: the security frontier

bash

# Default (no user namespace remapping): root inside = root on host (unless capability-restricted)
$ docker run --rm alpine id
uid=0(root) gid=0(root)
# Inside the container, you are root; the kernel knows.

# With userns-remap: root inside maps to a non-root UID on host
$ # /etc/docker/daemon.json: { "userns-remap": "default" }
$ docker run --rm alpine id
uid=0(root) gid=0(root)
# Inside, still root. But on the host, processes are some-non-root-UID.

User namespace remapping is one of the strongest container hardenings. A container escape no longer means root on the host.

cgroups v1 vs v2

v1 (legacy): separate hierarchies per controller (/sys/fs/cgroup/memory, /sys/fs/cgroup/cpu, etc.). Each process belongs to one cgroup per controller.
v2 (modern): single unified hierarchy. Each cgroup controls all enabled resources. Simpler, more consistent.
Most modern Linux distros (Fedora, Debian 11+, Ubuntu 22+) default to v2. Docker handles both.

Check which your host uses:

bash

stat -fc %T /sys/fs/cgroup
# 'cgroup2fs' = v2, 'tmpfs' = v1

Common mistakes

Treating namespaces as security boundaries

Namespaces hide things from a process; they do not stop a process with the right capabilities from breaking out. Combined with capabilities (default-drop CAP_SYS_ADMIN, etc.) and seccomp (syscall filtering), they form a defense — but namespaces alone are not impenetrable. Real isolation needs the full Docker security stack (or microVMs like Kata/Firecracker).

Forgetting cgroups limit memory but not OOM behavior

bash

docker run --memory=256m greedy-app
# When greedy-app tries to allocate the 257th MB, kernel OOM-kills it.

OOM-killed processes inside cgroups exit 137. Your supervisor / restart-policy decides what to do next.

Confusing user namespace with --user

--user 1000:1000 = run the process as a specific UID inside the container's user namespace (still root inside if not remapped).
userns-remap = remap UIDs from container to host. Unrelated knob.

Assuming PID namespaces work like containers everywhere

Kubernetes pods can share PID namespaces between containers (the shareProcessNamespace: true flag). In plain Docker, two docker run containers always have separate PID namespaces. Different mental model.

Real-world usage

Every container, everywhere. Namespaces and cgroups underlie every Docker, Podman, K8s pod, Lambda function (via Firecracker microVM), Cloud Run task.
Hardened multi-tenant: user namespace remapping + dropped capabilities + seccomp profile + read-only filesystem → strong-ish isolation.
Resource governance: cgroup limits prevent one tenant from starving others on shared infrastructure.
Debugging container internals: lsns, nsenter, /proc/<pid>/ns/* to inspect or join namespaces from outside.

Follow-up questions

Q: Are namespaces or cgroups Linux-specific?

A: Yes. Both are Linux kernel features. Docker on Mac/Windows runs a Linux VM under the hood for this reason.

Q: What is unshare?

A: A syscall (and CLI tool) that creates a new namespace and attaches the calling process. unshare --pid --fork --mount-proc /bin/bash gives you a shell in fresh PID + mount namespaces — the simplest "build a container by hand" demo.

Q: What is the difference between cgroups and ulimits?

A: ulimits (resource limits via PAM, setrlimit) are per-process. cgroups apply to a tree of processes and survive across exec. Both can limit resources; cgroups are the kernel's modern, hierarchical answer.

Q: Can I see what cgroup a host process is in?

A: Yes: cat /proc/<pid>/cgroup. For a Docker process, this points into /sys/fs/cgroup/<docker-cgroup-path>.

Q: (Senior) How would you build a minimal container by hand using just namespaces + cgroups?

A: unshare --pid --net --mount --uts --ipc --user --fork chroot /path/to/rootfs /bin/sh gives you a process with isolated namespaces and a custom rootfs — the bare bones of a container. Add cgroups manually by writing to /sys/fs/cgroup/<your-cgroup>/.... This is roughly what runc does for you, automated and OCI-spec-compliant. Useful exercise to demystify what containers are.

Examples

Demonstrate namespace isolation

bash

# Host
$ ps -ef | wc -l
312

# Inside container
$ docker run --rm alpine ps -ef | wc -l
2
# Container only sees its own processes (PID 1 + ps itself)

Same kernel, two views — that is the PID namespace in action.

Demonstrate cgroups

bash

# A memory-greedy program
$ docker run --rm --memory=64m alpine sh -c 'dd if=/dev/zero of=/dev/null bs=1G count=1'
Killed
# Container OOM-killed at the 64MB limit.

$ docker run --rm --cpus=0.1 alpine sh -c 'time -- timeout 5 sh -c "yes > /dev/null"'
real    0m 5.00s
user    0m 0.50s
sys     0m 0.00s
# user time = 0.5s in 5s wall = 10% CPU = the cap

The kernel enforces, Docker just sets the parameters.

Mapping --user vs userns-remap

bash

# --user: changes UID inside (no remap)
$ docker run --rm --user 1000:1000 alpine id
uid=1000 gid=1000
# Outside, still UID 1000 (no isolation between in/out namespaces).

# With userns-remap (daemon-level config):
# /etc/docker/daemon.json: { "userns-remap": "default" }
$ docker run --rm alpine id
uid=0(root) gid=0(root)
# Inside is root.
$ ps -ef | grep <container-pid>
UID 165536 ...
# But on the host, the same process is UID 165536 (offset).

The second is much stronger isolation. Highly recommended for multi-tenant clusters.

Short Answer

Interview ready

Premium

A concise answer to help you respond confidently on this topic during an interview.

Finished reading?