How to organize log management in Docker?

docs.questions.sections.docker~5 min read

Log management in Docker is the discipline of capturing, rotating, and centralizing what containers write. The default (json-file) works for dev but hides surprises in production: a chatty container can fill disk in hours, lookups across many containers are tedious, and one node's logs vanish if the node dies. A real production stack: stdout-only apps + log driver with rotation + a shipper that gets logs off-host.

Theory

TL;DR

Twelve-Factor: write to stdout/stderr; let the platform handle the rest.
Log drivers (json-file, journald, syslog, fluentd, loki, awslogs, gcplogs, splunk, etc.) determine what happens to those streams.
Default json-file writes to /var/lib/docker/containers/<id>/<id>-json.log. No rotation by default.
Set rotation via log-opts: max-size, max-file.
For multi-host setups, ship logs off-node: Loki, ELK, CloudWatch, Datadog.
Structure your logs (JSON), not plaintext. Future-you will thank you.

Why stdout-only

If your app writes to a file inside the container:

Stops on docker logs — Docker only sees stdout/stderr.
Disappears with the container unless you mount a volume — and managing a volume per container is busywork.
Defeats log drivers — drivers operate on stdout/stderr.
Breaks orchestration — Swarm/K8s assume stdout/stderr semantics.

A twelve-factor app writes to stdout. The platform (Docker, Compose, Swarm, K8s) routes it where it belongs.

Built-in log drivers

Driver	What it does	When to use
`json-file`	Default. Writes JSON to disk on host.	Dev, small ops.
`local`	Like json-file but binary, faster, has rotation built-in.	Small ops, lower overhead.
`journald`	Sends to systemd journal.	Linux hosts using journald centrally.
`syslog`	Sends to syslog daemon (local or remote).	Legacy, simple central collection.
`fluentd`	Sends to a Fluentd/Fluent Bit aggregator.	Mature, vendor-neutral central pipeline.
`loki`	Sends to Grafana Loki.	Modern, cheap, integrates with Grafana.
`awslogs`	Sends to AWS CloudWatch Logs.	AWS deployments.
`gcplogs`	Sends to GCP Cloud Logging.	GCP deployments.
`splunk`	Sends to Splunk HEC.	Splunk shops.
`none`	Drops logs.	Tests where you do not care.

Where logs end up by default

/var/lib/docker/containers/<id>/
├── <id>-json.log       # the active log file
├── <id>-json.log.1     # rotated (if rotation set)
└── <id>-json.log.2.gz  # rotated, compressed

Examples

Set host-wide rotation (production must-have)

json

// /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "5",
    "compress": "true"
  }
}

bash

sudo systemctl restart docker

Now every new container caps at 5 files of 100 MB = 500 MB worst case, with old files compressed.

Note: existing containers keep their old log config until recreated. If you set this on a server that is already running containers, those containers can still fill disk. Recreate or use per-container override.

Per-container override

bash

docker run -d \
    --log-driver=json-file \
    --log-opt max-size=50m \
    --log-opt max-file=3 \
    --name=app \
    myorg/app

Useful when one chatty service needs different limits.

Compose with loki driver

yaml

services:
  loki:
    image: grafana/loki:2.9.0
    ports: ["3100:3100"]

  grafana:
    image: grafana/grafana:10
    ports: ["3000:3000"]
    depends_on: ["loki"]

  api:
    image: myorg/api:1.0
    logging:
      driver: loki
      options:
        loki-url: "http://loki:3100/loki/api/v1/push"
        loki-retries: "5"
        loki-batch-size: "400"
        loki-external-labels: "job=api,env=prod"

loki-external-labels adds tags so you can filter by service in Grafana. This driver requires the docker-driver plugin (a one-time install): docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions.

Structured logging (JSON)

App side (Node.js example):

console.log(JSON.stringify({
  level: 'info',
  msg: 'request handled',
  request_id: 'abc-123',
  user_id: 42,
  duration_ms: 87
}))

With json-file, Docker wraps this:

json

{"log": "{\"level\":\"info\",\"msg\":\"request handled\",...}\n", "stream": "stdout", "time": "..."}

The outer wrapper is Docker's; the inner JSON is your structured log. Loki, ELK, etc. parse the inner JSON and you can query {job="api"} |= "abc-123" or filter by user_id.

Fluentd / Fluent Bit pipeline

yaml

services:
  fluentbit:
    image: fluent/fluent-bit:2.2
    ports: ["24224:24224"]
    volumes:
      - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf:ro

  api:
    image: myorg/api:1.0
    logging:
      driver: fluentd
      options:
        fluentd-address: localhost:24224
        tag: api
        # If fluent-bit is unreachable, drop logs to avoid blocking the app:
        fluentd-async: "true"

A Fluent Bit config can route the same logs to multiple destinations: a hot search index for the last day, cold storage for compliance.

Reading logs from Docker (and limits of `docker logs`)

bash

docker logs -f --tail=100 api                    # tail and follow
docker logs --since=10m api                      # last 10 minutes
docker logs --until=2024-01-15T10:00:00 api      # until a timestamp

docker logs works only with json-file, local, and journald drivers. With fluentd, loki, syslog, etc., the logs do not stay on disk and docker logs returns nothing — use the centralized backend instead.

CloudWatch on AWS

bash

docker run -d \
    --log-driver=awslogs \
    --log-opt awslogs-region=us-east-1 \
    --log-opt awslogs-group=myapp \
    --log-opt awslogs-stream=api-1 \
    --log-opt awslogs-create-group=true \
    myorg/api:1.0

The Docker daemon needs IAM permissions to write to CloudWatch (logs:CreateLogStream, logs:PutLogEvents). On EC2 with an instance role, add a policy; on Fargate, the task role handles it.

Kubernetes-style: kubelet + sidecar

Docker driver-less option: leave logs on disk via json-file, run a per-host shipper (Fluent Bit, Promtail, Vector) that reads /var/lib/docker/containers/*/*-json.log and ships them. Decouples app deploy from log destination.

Real-world usage

Tiny / hobby: json-file with rotation. Read with docker logs.
Single-host with monitoring: Loki + Promtail + Grafana, reading /var/lib/docker/containers.
Multi-host self-hosted: Fluent Bit per host → Elasticsearch/OpenSearch + Kibana, or Loki + Grafana.
Cloud: native — awslogs, gcplogs, Datadog, Splunk.
Compliance environments: long-term cold storage (S3) + hot index for the last 30 days.

Common mistakes

Writing to log files inside the container

A Java app with log4j writing to /var/log/app.log defeats docker logs and disappears with the container. Configure log4j to write to console (stdout) instead.

No rotation on json-file

Default Docker has no log rotation. A chatty app on default config will fill disk. Always set max-size and max-file.

Logging sensitive data

Passwords, JWTs, PII showing up in logs because someone console.log(req.body). Sanitize at the source. Centralized log indexing makes leaks easy to find — and easy to scrape.

Using docker logs in production for everything

Works on a single host but does not scale. Centralize early; trying to retroactively add log shipping mid-incident is painful.

Not setting compress: true

Rotated .log.N files unrotated and uncompressed sit on disk full size. compress: "true" gzips them, often 10x smaller.

Follow-up questions

Q: Why does docker logs return nothing for some containers?

A: That container uses a non-disk driver (fluentd, loki, awslogs). Logs are shipped, not stored. Query the destination instead.

Q: What is the difference between json-file and local?

A: Both store on disk on the host. local uses a binary format (smaller, faster), supports built-in rotation, and is recommended for new setups. json-file is older and the historical default. Functionally similar for docker logs.

Q: Should I log JSON or plaintext?

A: JSON. Modern log backends parse it natively, you can filter by fields (level=error, user_id=42). Plaintext is fine for tiny services but does not scale.

Q: (Senior) What are the trade-offs between fluentd-async: true and synchronous?

A: Synchronous: if Fluent Bit is down, the Docker daemon blocks on write(), which can hang containers. Async: Docker buffers (small, in-memory) and continues; on overflow, logs drop. For app stability, async with a generous buffer is safer; for compliance where log loss is unacceptable, run a high-availability log aggregator (failover, persistent queues) and use sync.

Q: (Senior) How do you handle multi-line stack traces in logs?

A: stdout writes each line as a separate event. A 30-line Java stack trace becomes 30 unrelated log entries. Two fixes: (1) Have the app emit a single JSON event with the full trace as a string field. (2) Use a log shipper (Fluent Bit, Filebeat, Vector) with multiline parsing rules that join lines starting with whitespace into the previous event. Source-side fix is more reliable.

Short Answer

Interview ready

Premium

A concise answer to help you respond confidently on this topic during an interview.

Comments

No comments yet