Skip to main content

What are Linux namespaces and cgroups in Docker?

Linux namespaces and cgroups are the two kernel mechanisms that make containers possible. Without them, you have processes; with them, you have containers. Docker is mostly a tool for configuring these features at scale.

Theory

TL;DR

  • Namespaces answer "what can this process see?". Seven types: PID, mount, network, IPC, UTS, user, cgroup.
  • cgroups answer "what can this process use?". Limits CPU, memory, I/O, PIDs.
  • Both are kernel features that pre-date Docker by years (LXC popularized them; Docker made them mainstream).
  • A container is conceptually: unshare() to create namespaces + cgroups config + chroot (or pivot_root) into a rootfs + exec() your binary.
  • Modern Docker on Linux uses cgroups v2 (unified hierarchy). Legacy v1 had separate hierarchies per controller.

Namespaces — the seven types

NamespaceWhat it isolates
PIDprocess IDs — container has its own PID 1
mount (mnt)mount points — container has its own filesystem view
network (net)network interfaces, routing tables, sockets, ports
IPCshared memory, semaphores, message queues
UTShostname and domain name
userUIDs/GIDs (with remapping)
cgroupcgroup root view (cgroups v2)

Each namespace is a kernel resource you can attach a process to via unshare(2) or clone(2). Docker creates them when starting a container.

Verifying namespaces in a running container

bash
# A container's namespaces (each has a unique inode) $ docker run -it --rm alpine sh / # ls -la /proc/self/ns lrwxrwxrwx ... cgroup -> 'cgroup:[4026532840]' lrwxrwxrwx ... ipc -> 'ipc:[4026532838]' lrwxrwxrwx ... mnt -> 'mnt:[4026532836]' lrwxrwxrwx ... net -> 'net:[4026532842]' lrwxrwxrwx ... pid -> 'pid:[4026532839]' lrwxrwxrwx ... user -> 'user:[4026531837]' lrwxrwxrwx ... uts -> 'uts:[4026532837]' # Compare with the host (different inodes for everything except possibly user) $ ls -la /proc/self/ns

Different inodes = different namespaces = isolated views.

cgroups — what they limit

In cgroups v2 (unified hierarchy, the modern norm), the controllers include:

cpu — CPU time (cpus, cpu.weight) memory — RAM and swap usage io — block I/O bandwidth and IOPS pids — number of processes rdma — RDMA bandwidth hugetlb — huge pages
bash
# Inside a container with --memory=256m $ docker run --rm --memory=256m alpine cat /sys/fs/cgroup/memory.max 268435456 # 256 * 1024 * 1024 bytes $ docker run --rm --cpus=0.5 alpine cat /sys/fs/cgroup/cpu.max 50000 100000 # 50ms quota per 100ms period (= 0.5 CPU)

Docker translates --memory, --cpus, etc. into cgroup files in /sys/fs/cgroup/....

How docker run uses both

docker run --memory=256m --cpus=1 myapp ├── containerd → runc │ │ │ ├── unshare(CLONE_NEW{PID,NS,NET,IPC,UTS,USER,CGROUP}) │ │ → 7 fresh namespaces │ │ │ ├── write cgroup files │ │ → memory.max=256MB, cpu.max=100000 100000 │ │ │ ├── pivot_root into image rootfs │ │ │ └── exec(your-binary) └── you see a 'container'

Namespaces give it private views; cgroups limit what it can do; pivot_root + image layers give it a custom filesystem; exec runs your binary as PID 1 in this little world.

User namespace: the security frontier

bash
# Default (no user namespace remapping): root inside = root on host (unless capability-restricted) $ docker run --rm alpine id uid=0(root) gid=0(root) # Inside the container, you are root; the kernel knows. # With userns-remap: root inside maps to a non-root UID on host $ # /etc/docker/daemon.json: { "userns-remap": "default" } $ docker run --rm alpine id uid=0(root) gid=0(root) # Inside, still root. But on the host, processes are some-non-root-UID.

User namespace remapping is one of the strongest container hardenings. A container escape no longer means root on the host.

cgroups v1 vs v2

  • v1 (legacy): separate hierarchies per controller (/sys/fs/cgroup/memory, /sys/fs/cgroup/cpu, etc.). Each process belongs to one cgroup per controller.
  • v2 (modern): single unified hierarchy. Each cgroup controls all enabled resources. Simpler, more consistent.
  • Most modern Linux distros (Fedora, Debian 11+, Ubuntu 22+) default to v2. Docker handles both.

Check which your host uses:

bash
stat -fc %T /sys/fs/cgroup # 'cgroup2fs' = v2, 'tmpfs' = v1

Common mistakes

Treating namespaces as security boundaries

Namespaces hide things from a process; they do not stop a process with the right capabilities from breaking out. Combined with capabilities (default-drop CAP_SYS_ADMIN, etc.) and seccomp (syscall filtering), they form a defense — but namespaces alone are not impenetrable. Real isolation needs the full Docker security stack (or microVMs like Kata/Firecracker).

Forgetting cgroups limit memory but not OOM behavior

bash
docker run --memory=256m greedy-app # When greedy-app tries to allocate the 257th MB, kernel OOM-kills it.

OOM-killed processes inside cgroups exit 137. Your supervisor / restart-policy decides what to do next.

Confusing user namespace with --user

  • --user 1000:1000 = run the process as a specific UID inside the container's user namespace (still root inside if not remapped).
  • userns-remap = remap UIDs from container to host. Unrelated knob.

Assuming PID namespaces work like containers everywhere

Kubernetes pods can share PID namespaces between containers (the shareProcessNamespace: true flag). In plain Docker, two docker run containers always have separate PID namespaces. Different mental model.

Real-world usage

  • Every container, everywhere. Namespaces and cgroups underlie every Docker, Podman, K8s pod, Lambda function (via Firecracker microVM), Cloud Run task.
  • Hardened multi-tenant: user namespace remapping + dropped capabilities + seccomp profile + read-only filesystem → strong-ish isolation.
  • Resource governance: cgroup limits prevent one tenant from starving others on shared infrastructure.
  • Debugging container internals: lsns, nsenter, /proc/<pid>/ns/* to inspect or join namespaces from outside.

Follow-up questions

Q: Are namespaces or cgroups Linux-specific?


A: Yes. Both are Linux kernel features. Docker on Mac/Windows runs a Linux VM under the hood for this reason.

Q: What is unshare?


A: A syscall (and CLI tool) that creates a new namespace and attaches the calling process. unshare --pid --fork --mount-proc /bin/bash gives you a shell in fresh PID + mount namespaces — the simplest "build a container by hand" demo.

Q: What is the difference between cgroups and ulimits?


A: ulimits (resource limits via PAM, setrlimit) are per-process. cgroups apply to a tree of processes and survive across exec. Both can limit resources; cgroups are the kernel's modern, hierarchical answer.

Q: Can I see what cgroup a host process is in?


A: Yes: cat /proc/<pid>/cgroup. For a Docker process, this points into /sys/fs/cgroup/<docker-cgroup-path>.

Q: (Senior) How would you build a minimal container by hand using just namespaces + cgroups?


A: unshare --pid --net --mount --uts --ipc --user --fork chroot /path/to/rootfs /bin/sh gives you a process with isolated namespaces and a custom rootfs — the bare bones of a container. Add cgroups manually by writing to /sys/fs/cgroup/<your-cgroup>/.... This is roughly what runc does for you, automated and OCI-spec-compliant. Useful exercise to demystify what containers are.

Examples

Demonstrate namespace isolation

bash
# Host $ ps -ef | wc -l 312 # Inside container $ docker run --rm alpine ps -ef | wc -l 2 # Container only sees its own processes (PID 1 + ps itself)

Same kernel, two views — that is the PID namespace in action.

Demonstrate cgroups

bash
# A memory-greedy program $ docker run --rm --memory=64m alpine sh -c 'dd if=/dev/zero of=/dev/null bs=1G count=1' Killed # Container OOM-killed at the 64MB limit. $ docker run --rm --cpus=0.1 alpine sh -c 'time -- timeout 5 sh -c "yes > /dev/null"' real 0m 5.00s user 0m 0.50s sys 0m 0.00s # user time = 0.5s in 5s wall = 10% CPU = the cap

The kernel enforces, Docker just sets the parameters.

Mapping --user vs userns-remap

bash
# --user: changes UID inside (no remap) $ docker run --rm --user 1000:1000 alpine id uid=1000 gid=1000 # Outside, still UID 1000 (no isolation between in/out namespaces). # With userns-remap (daemon-level config): # /etc/docker/daemon.json: { "userns-remap": "default" } $ docker run --rm alpine id uid=0(root) gid=0(root) # Inside is root. $ ps -ef | grep <container-pid> UID 165536 ... # But on the host, the same process is UID 165536 (offset).

The second is much stronger isolation. Highly recommended for multi-tenant clusters.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Comments

No comments yet