What is a Docker overlay network and when do you need it?
A Docker overlay network is the answer to "how do containers on host A talk to containers on host B?". It builds a virtual L2 network on top of the existing IP network using VXLAN encapsulation, so containers see each other as if on the same physical LAN regardless of which host they actually run on.
Theory
TL;DR
- Overlay = a network that spans multiple Docker hosts (Swarm nodes).
- Built on VXLAN (Virtual eXtensible LAN, RFC 7348). Each container packet is wrapped in a UDP packet and shipped over the underlay network.
- Requires Docker Swarm —
docker swarm initfirst. Without Swarm,docker network create --driver overlayerrors out. - Encrypted with
--opt encryptedon creation; uses IPsec/AES-GCM in modern Swarm. - Use when: Swarm services need cross-host service discovery and load balancing.
- Skip when: single host (bridge is enough), or Kubernetes (it has its own CNI plugins, not Docker overlay).
Quick example
# Initialize Swarm on the first node
node-1$ docker swarm init --advertise-addr 192.168.1.10
# Outputs a join token; run on each worker:
node-2$ docker swarm join --token SWMTKN-... 192.168.1.10:2377
# On manager, create an overlay network
node-1$ docker network create --driver overlay --attachable appnet
# Deploy a service across nodes
node-1$ docker service create --name api --network appnet --replicas 5 myapp
# 5 containers spread across both nodes; all on appnet, all reach each other.
# Containers on node-1 and node-2 can ping each other by service name
node-1$ docker exec <api-container> ping api # round-robins via Swarm DNSWithout overlay, the api container on node-1 has no IP route to a container on node-2. With overlay, traffic gets VXLAN-wrapped and tunneled.
How VXLAN encapsulation works
container A (10.0.0.5) on node-1 container B (10.0.0.6) on node-2
| |
| packet: src=10.0.0.5 dst=10.0.0.6 |
v ^
+-----------------+ +-----------------+
| overlay vxlan | | overlay vxlan |
| wraps in UDP/4789| underlay: 192.168.1.10/.20 | unwraps |
+-----------------+ ────────────────────────────►+-----------------+
| ^
| on the wire: UDP 4789, payload = original packet|
v |
host eth0 (192.168.1.10) ──────────────────────────► host eth0 (192.168.1.20)The overlay network has its own IP space (e.g., 10.0.0.0/24). Each container gets an IP there. The host has a separate IP on the underlay (192.168.1.0/24). Container packets are wrapped in UDP packets to the destination host's underlay IP.
Service discovery on overlay
Overlay networks include Docker Swarm's embedded DNS plus a built-in load balancer (the "routing mesh"):
# Service "api" has 5 replicas across 3 nodes. Inside any container on appnet:
$ nslookup api
Server: 127.0.0.11
Address 1: 10.0.1.5 api
# A virtual IP that load-balances across the 5 replicas via IPVS.The service name resolves to a virtual IP. Connections to that VIP are distributed among healthy replicas by the kernel's IPVS module on each node. Containers do not need to know which physical host runs which replica.
Encrypted overlays
docker network create --driver overlay --opt encrypted my-secure-netAdds AES-GCM encryption between Swarm nodes. Useful when the underlay network is untrusted (cross-region, public internet between nodes). Performance cost is real but usually acceptable.
When you need overlay
- Docker Swarm services running on more than one node and needing to talk to each other.
- Service-to-service traffic that should not transit the public internet (use overlay across VPN-linked datacenters).
- Anywhere a single bridge cannot reach because containers live on different hosts.
When you do NOT need overlay
- Single Docker host: bridge is simpler and faster.
- Kubernetes: uses CNI plugins (Calico, Flannel, Cilium) — not Docker overlay. K8s has its own equivalent at the cluster level.
- External communication: containers reach the internet via NAT regardless of network type. Overlay is for container-to-container, not container-to-internet.
Common mistakes
Trying overlay without Swarm
$ docker network create --driver overlay test
Error response from daemon: This node is not a swarm manager.Overlay is a Swarm feature. docker swarm init first.
Forgetting --attachable
# Without --attachable, only Swarm services can join.
# Standalone `docker run --network myoverlay` fails.
docker network create --driver overlay myoverlay
# With --attachable, both services and standalone containers can attach.
docker network create --driver overlay --attachable myoverlayFor mixed workflows (service + ad-hoc debug containers), use --attachable.
Underlay firewall blocking VXLAN
VXLAN uses UDP port 4789. Swarm also needs TCP/UDP 7946 (gossip) and TCP 2377 (manager API). Cloud security groups or on-prem firewalls between nodes must allow these.
Picking overlay for a single-host setup
No benefit, real overhead (VXLAN encap, even when both endpoints are local). Stick with bridge until you actually go multi-host.
Real-world usage
- Swarm production: every cross-host service connection. Swarm builds an overlay network called
ingressautomatically for the routing mesh. - Multi-region Swarm: encrypted overlay across cloud regions, VXLAN tunneling over WAN. Works but be honest about latency.
- Hybrid demos / lab setups: overlay across a couple of laptops at a workshop to simulate multi-host without K8s.
Follow-up questions
Q: What is the difference between overlay and bridge?
A: Bridge is single-host (all containers on one Docker daemon). Overlay is multi-host (containers can be on different machines). Overlay carries the L2 abstraction across the IP underlay; bridge is just a Linux bridge on one host.
Q: Does overlay work with regular docker run containers, or only services?
A: Both, if you create the overlay with --attachable. Without that flag, only Swarm services can use it.
Q: Why not always use overlay even on single host?
A: Performance overhead (VXLAN encapsulation). Complexity (requires Swarm running). For single host, bridge gives the same container DNS without the overhead.
Q: How is Docker overlay different from Kubernetes pod networking?
A: Both solve cross-host container-to-container connectivity. Docker overlay is built into Swarm and uses VXLAN. Kubernetes uses CNI plugins, of which there are many (some VXLAN-based like Flannel, others routing-based like Calico). Different ecosystems, similar problem.
Q: (Senior) What is the routing mesh in Docker Swarm and how does it relate to overlay?
A: The routing mesh is a special overlay called ingress that Swarm creates automatically. When you publish a service port (-p 80:80 on docker service create), every Swarm node listens on host:80, and traffic that arrives is forwarded over the overlay to a node actually running a replica. Combined with IPVS load balancing inside each node, you get any-node-can-receive-traffic behavior without external load balancers. Caveat: routing-mesh DNAT can break source IP visibility — use --publish mode=host if you need real client IPs.
Examples
Setting up a two-node Swarm with overlay
# Node 1 (manager)
node-1$ docker swarm init --advertise-addr 192.168.1.10
Swarm initialized: current node (xxx) is now a manager.
To add a worker: docker swarm join --token SWMTKN-... 192.168.1.10:2377
# Node 2 (worker)
node-2$ docker swarm join --token SWMTKN-... 192.168.1.10:2377
# Verify
node-1$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
xxx node-1 Ready Active Leader
yyy node-2 Ready Active
# Create overlay
node-1$ docker network create --driver overlay --attachable appnet
# Deploy a service spread across both nodes
node-1$ docker service create --name api --replicas 4 --network appnet myapp
# Confirm placement
node-1$ docker service ps api
# 4 replicas, 2 on each nodeEncrypted overlay across regions
# On a Swarm where nodes live in different cloud regions
$ docker network create --driver overlay \
--opt encrypted \
--subnet 10.5.0.0/16 \
secure-net
$ docker service create --name api --replicas 6 --network secure-net myapp
# Service traffic between us-east-1 and eu-west-1 is now AES-encrypted in transit.The encryption flag adds IPsec ESP between hosts. CPU cost: noticeable for high-throughput services; usually fine for normal RPC traffic.
Short Answer
Interview readyA concise answer to help you respond confidently on this topic during an interview.
Comments
No comments yet