What are Linux namespaces and cgroups in Docker?
Linux namespaces and cgroups are the two kernel mechanisms that make containers possible. Without them, you have processes; with them, you have containers. Docker is mostly a tool for configuring these features at scale.
Theory
TL;DR
- Namespaces answer "what can this process see?". Seven types: PID, mount, network, IPC, UTS, user, cgroup.
- cgroups answer "what can this process use?". Limits CPU, memory, I/O, PIDs.
- Both are kernel features that pre-date Docker by years (LXC popularized them; Docker made them mainstream).
- A container is conceptually:
unshare()to create namespaces +cgroupsconfig +chroot(or pivot_root) into a rootfs +exec()your binary. - Modern Docker on Linux uses cgroups v2 (unified hierarchy). Legacy v1 had separate hierarchies per controller.
Namespaces — the seven types
| Namespace | What it isolates |
|---|---|
| PID | process IDs — container has its own PID 1 |
| mount (mnt) | mount points — container has its own filesystem view |
| network (net) | network interfaces, routing tables, sockets, ports |
| IPC | shared memory, semaphores, message queues |
| UTS | hostname and domain name |
| user | UIDs/GIDs (with remapping) |
| cgroup | cgroup root view (cgroups v2) |
Each namespace is a kernel resource you can attach a process to via unshare(2) or clone(2). Docker creates them when starting a container.
Verifying namespaces in a running container
# A container's namespaces (each has a unique inode)
$ docker run -it --rm alpine sh
/ # ls -la /proc/self/ns
lrwxrwxrwx ... cgroup -> 'cgroup:[4026532840]'
lrwxrwxrwx ... ipc -> 'ipc:[4026532838]'
lrwxrwxrwx ... mnt -> 'mnt:[4026532836]'
lrwxrwxrwx ... net -> 'net:[4026532842]'
lrwxrwxrwx ... pid -> 'pid:[4026532839]'
lrwxrwxrwx ... user -> 'user:[4026531837]'
lrwxrwxrwx ... uts -> 'uts:[4026532837]'
# Compare with the host (different inodes for everything except possibly user)
$ ls -la /proc/self/nsDifferent inodes = different namespaces = isolated views.
cgroups — what they limit
In cgroups v2 (unified hierarchy, the modern norm), the controllers include:
cpu — CPU time (cpus, cpu.weight)
memory — RAM and swap usage
io — block I/O bandwidth and IOPS
pids — number of processes
rdma — RDMA bandwidth
hugetlb — huge pages# Inside a container with --memory=256m
$ docker run --rm --memory=256m alpine cat /sys/fs/cgroup/memory.max
268435456 # 256 * 1024 * 1024 bytes
$ docker run --rm --cpus=0.5 alpine cat /sys/fs/cgroup/cpu.max
50000 100000 # 50ms quota per 100ms period (= 0.5 CPU)Docker translates --memory, --cpus, etc. into cgroup files in /sys/fs/cgroup/....
How docker run uses both
docker run --memory=256m --cpus=1 myapp
│
├── containerd → runc
│ │
│ ├── unshare(CLONE_NEW{PID,NS,NET,IPC,UTS,USER,CGROUP})
│ │ → 7 fresh namespaces
│ │
│ ├── write cgroup files
│ │ → memory.max=256MB, cpu.max=100000 100000
│ │
│ ├── pivot_root into image rootfs
│ │
│ └── exec(your-binary)
│
└── you see a 'container'Namespaces give it private views; cgroups limit what it can do; pivot_root + image layers give it a custom filesystem; exec runs your binary as PID 1 in this little world.
User namespace: the security frontier
# Default (no user namespace remapping): root inside = root on host (unless capability-restricted)
$ docker run --rm alpine id
uid=0(root) gid=0(root)
# Inside the container, you are root; the kernel knows.
# With userns-remap: root inside maps to a non-root UID on host
$ # /etc/docker/daemon.json: { "userns-remap": "default" }
$ docker run --rm alpine id
uid=0(root) gid=0(root)
# Inside, still root. But on the host, processes are some-non-root-UID.User namespace remapping is one of the strongest container hardenings. A container escape no longer means root on the host.
cgroups v1 vs v2
- v1 (legacy): separate hierarchies per controller (
/sys/fs/cgroup/memory,/sys/fs/cgroup/cpu, etc.). Each process belongs to one cgroup per controller. - v2 (modern): single unified hierarchy. Each cgroup controls all enabled resources. Simpler, more consistent.
- Most modern Linux distros (Fedora, Debian 11+, Ubuntu 22+) default to v2. Docker handles both.
Check which your host uses:
stat -fc %T /sys/fs/cgroup
# 'cgroup2fs' = v2, 'tmpfs' = v1Common mistakes
Treating namespaces as security boundaries
Namespaces hide things from a process; they do not stop a process with the right capabilities from breaking out. Combined with capabilities (default-drop CAP_SYS_ADMIN, etc.) and seccomp (syscall filtering), they form a defense — but namespaces alone are not impenetrable. Real isolation needs the full Docker security stack (or microVMs like Kata/Firecracker).
Forgetting cgroups limit memory but not OOM behavior
docker run --memory=256m greedy-app
# When greedy-app tries to allocate the 257th MB, kernel OOM-kills it.OOM-killed processes inside cgroups exit 137. Your supervisor / restart-policy decides what to do next.
Confusing user namespace with --user
--user 1000:1000= run the process as a specific UID inside the container's user namespace (still root inside if not remapped).userns-remap= remap UIDs from container to host. Unrelated knob.
Assuming PID namespaces work like containers everywhere
Kubernetes pods can share PID namespaces between containers (the shareProcessNamespace: true flag). In plain Docker, two docker run containers always have separate PID namespaces. Different mental model.
Real-world usage
- Every container, everywhere. Namespaces and cgroups underlie every Docker, Podman, K8s pod, Lambda function (via Firecracker microVM), Cloud Run task.
- Hardened multi-tenant: user namespace remapping + dropped capabilities + seccomp profile + read-only filesystem → strong-ish isolation.
- Resource governance: cgroup limits prevent one tenant from starving others on shared infrastructure.
- Debugging container internals:
lsns,nsenter,/proc/<pid>/ns/*to inspect or join namespaces from outside.
Follow-up questions
Q: Are namespaces or cgroups Linux-specific?
A: Yes. Both are Linux kernel features. Docker on Mac/Windows runs a Linux VM under the hood for this reason.
Q: What is unshare?
A: A syscall (and CLI tool) that creates a new namespace and attaches the calling process. unshare --pid --fork --mount-proc /bin/bash gives you a shell in fresh PID + mount namespaces — the simplest "build a container by hand" demo.
Q: What is the difference between cgroups and ulimits?
A: ulimits (resource limits via PAM, setrlimit) are per-process. cgroups apply to a tree of processes and survive across exec. Both can limit resources; cgroups are the kernel's modern, hierarchical answer.
Q: Can I see what cgroup a host process is in?
A: Yes: cat /proc/<pid>/cgroup. For a Docker process, this points into /sys/fs/cgroup/<docker-cgroup-path>.
Q: (Senior) How would you build a minimal container by hand using just namespaces + cgroups?
A: unshare --pid --net --mount --uts --ipc --user --fork chroot /path/to/rootfs /bin/sh gives you a process with isolated namespaces and a custom rootfs — the bare bones of a container. Add cgroups manually by writing to /sys/fs/cgroup/<your-cgroup>/.... This is roughly what runc does for you, automated and OCI-spec-compliant. Useful exercise to demystify what containers are.
Examples
Demonstrate namespace isolation
# Host
$ ps -ef | wc -l
312
# Inside container
$ docker run --rm alpine ps -ef | wc -l
2
# Container only sees its own processes (PID 1 + ps itself)Same kernel, two views — that is the PID namespace in action.
Demonstrate cgroups
# A memory-greedy program
$ docker run --rm --memory=64m alpine sh -c 'dd if=/dev/zero of=/dev/null bs=1G count=1'
Killed
# Container OOM-killed at the 64MB limit.
$ docker run --rm --cpus=0.1 alpine sh -c 'time -- timeout 5 sh -c "yes > /dev/null"'
real 0m 5.00s
user 0m 0.50s
sys 0m 0.00s
# user time = 0.5s in 5s wall = 10% CPU = the capThe kernel enforces, Docker just sets the parameters.
Mapping --user vs userns-remap
# --user: changes UID inside (no remap)
$ docker run --rm --user 1000:1000 alpine id
uid=1000 gid=1000
# Outside, still UID 1000 (no isolation between in/out namespaces).
# With userns-remap (daemon-level config):
# /etc/docker/daemon.json: { "userns-remap": "default" }
$ docker run --rm alpine id
uid=0(root) gid=0(root)
# Inside is root.
$ ps -ef | grep <container-pid>
UID 165536 ...
# But on the host, the same process is UID 165536 (offset).The second is much stronger isolation. Highly recommended for multi-tenant clusters.
Short Answer
Interview readyA concise answer to help you respond confidently on this topic during an interview.
Comments
No comments yet