Skip to main content

How to set up a private Docker registry with Harbor?

Harbor is the dominant open-source private container registry. It wraps the basic OCI Distribution spec (the same protocol Docker Hub uses) with the production features that on-prem and enterprise teams actually need.

Theory

TL;DR

  • Open-source registry built on top of CNCF distribution. CNCF graduated since 2020.
  • Beyond plain registry: RBAC, multi-tenant projects, vulnerability scanning (Trivy/Clair), image signing (Notary, Cosign), replication to other registries, retention policies, garbage collection, OIDC/LDAP auth.
  • Deploy via Compose for single-node or Helm chart for HA on Kubernetes.
  • Used as on-prem alternative to AWS ECR / Google GCR / Docker Hub when air-gapped, multi-region, or compliance-driven.
  • Speaks plain docker pull/push — your existing CI/CD does not change.

Architecture

+----------+ +----------+ +-----------+ | nginx / | | core | | registry | | portal | -> | (API) | -> | (CNCF | | | | + jobsvc | | distrib.) | +----------+ +----------+ +-----------+ | | | v v v +-----+ +--------+ +-------+ | UI | | trivy | | redis | +-----+ | scanner| +-------+ +--------+ | v +-----------+ | postgres | +-----------+

A dozen containers running together (web UI, API, DB, Redis, registry, scanner, etc.). The Compose installer brings it all up.

Single-node install (Compose)

bash
# Download wget https://github.com/goharbor/harbor/releases/download/v2.11.0/harbor-online-installer-v2.11.0.tgz tar xzf harbor-online-installer-v2.11.0.tgz cd harbor # Configure cp harbor.yml.tmpl harbor.yml vi harbor.yml # hostname: harbor.example.com # https.port: 443 # https.certificate: /etc/cert/fullchain.pem # https.private_key: /etc/cert/privkey.pem # harbor_admin_password: <strong-pw> # database.password: <db-pw> # Install with Trivy scanner enabled sudo ./install.sh --with-trivy # Brings up: ~10 containers via docker compose

Open https://harbor.example.com → log in as admin with the password from harbor.yml.

Pushing/pulling

bash
# Log in docker login harbor.example.com -u admin -p <password> # Tag and push docker tag myapp:1.0 harbor.example.com/myproject/myapp:1.0 docker push harbor.example.com/myproject/myapp:1.0 # Pull from another host docker pull harbor.example.com/myproject/myapp:1.0

Harbor speaks the same protocol as Docker Hub. CI/CD scripts only change the registry hostname.

Projects and RBAC

Harbor groups images into projects. Each project is a namespace with its own:

  • Visibility: private (auth required) or public (no auth for pulls).
  • Member roles: Project Admin, Maintainer, Developer, Guest.
  • Vulnerability scan policies.
  • Retention policies.
  • Replication rules.

Typical multi-tenant setup: one Harbor instance, one project per team. Project Admins manage their team's images; cross-project access is controlled.

Vulnerability scanning

With --with-trivy, Harbor includes a built-in scanner:

  • Scan on push (configurable per project).
  • Scheduled re-scans of all images.
  • CVE results visible in UI; gate pulls by severity ("prevent vulnerable images").
  • Tags marked unsafe block deploys via admission policy.
yaml
# In project settings Vulnerability Scanning: enabled Prevent vulnerable images: HIGH and above Auto-scan on push: true

Replication

Harbor can mirror images to and from other registries:

  • Pull-through cache: configure Docker Hub as a remote, Harbor caches pulls. Speeds up downloads, survives Docker Hub rate limits.
  • Push to another Harbor: multi-region setups replicate prod images to each region's Harbor.
  • Push to Docker Hub / ECR / GCR: distribute to multiple registries from one source.
yaml
# Replication rule Name: replicate-to-eu Mode: pull / push / event-based Filter: project=prod, tag=v* Destination: harbor-eu.example.com Trigger: scheduled / manual / on-push

Image signing

Two options:

  • Notary v1 (built-in, DCT-compatible) — signs image manifests.
  • Cosign / Sigstore (modern) — signs OCI artifacts. Increasingly the default.

Enforce "only signed images deploy" via admission control (Kyverno, sigstore-policy-controller).

Retention policies

Without retention, every PR build accumulates forever. Harbor lets you set rules:

Retain: most recently pushed 10 tags Exclude: tags matching v* (keep all release tags) Apply to: project=staging

Garbage collection runs in a scheduled job; freed space is reclaimable.

HA install (Helm)

For production, run Harbor on Kubernetes:

bash
helm repo add harbor https://helm.goharbor.io helm upgrade --install harbor harbor/harbor \ --set expose.type=ingress \ --set externalURL=https://harbor.example.com \ --set persistence.enabled=true \ --set persistence.persistentVolumeClaim.registry.size=500Gi

Replicates services across nodes; persistent volumes for the registry storage and Postgres. Survives node failure if backed by network storage.

Common mistakes

Forgetting to set up TLS

Harbor harbor.yml defaults assume HTTP for testing. Production requires HTTPS — Docker daemon refuses to push to untrusted HTTP registries by default. Generate certs (Let's Encrypt, internal CA) and configure https: block in harbor.yml.

Running on the same host as critical workloads

A registry that hosts your production images is itself critical infrastructure. If it crashes, you cannot deploy. Run it on dedicated hardware or its own K8s cluster.

No retention policy → disk fills up

2 TB of CI builds accumulated. Backup window: 12 hours. Recovery window: nervous.

Set retention policies from day one. Garbage collection scheduled weekly minimum.

Missing storage backups

Harbor's Postgres has metadata; the registry's filesystem has the actual blobs. Both need backup. Losing metadata = images orphaned. Losing blobs = metadata pointing at nothing.

Deploying without scanning

If you have Harbor with Trivy and do not enable scanning, you are paying for the feature and not using it. At minimum: scan-on-push for all projects.

Real-world usage

  • On-prem enterprise: finance, healthcare, telecom — Harbor is the default for self-hosted regulated environments.
  • Multi-cloud: Harbor as the central registry, with replication to AWS ECR / GCR for region-local pulls.
  • Air-gapped: classified networks where the public internet is forbidden; Harbor + Trivy DB updates via offline sync.
  • Open-source projects: some maintain their own Harbor for community-built images.
  • Pull-through cache: Harbor cached Docker Hub for thousands of CI runs without hitting rate limits.

Follow-up questions

Q: What is the difference between Harbor and the bare registry:2 image?


A: registry:2 is the basic CNCF distribution server — pull/push, that is it. Harbor wraps it with auth, RBAC, UI, scanning, signing, replication. For toy use or quick local testing, registry:2. For real production, Harbor.

Q: Can Harbor scan images from external registries?


A: Only images stored in Harbor. To scan external images, set up replication to pull them into Harbor first.

Q: How does Harbor handle storage?


A: Filesystem (default), S3-compatible (AWS S3, Minio, GCS, Azure Blob), or Swift. Configure via harbor.yml storage section.

Q: Can I run Harbor in HA?


A: Yes — Helm chart with multiple replicas, external Postgres (HA), external Redis (HA), shared storage backend (S3 or NFS). Single-node Compose install is for small/staging; production = HA.

Q: (Senior) How would you architect Harbor for a global multi-region deploy?


A: Central "hub" Harbor where CI pushes; "spoke" Harbors in each region for low-latency pulls. Hub-to-spoke replication on event (push triggers replica push within minutes). Spoke storage uses cloud-region-local object store (S3, GCS). Hub backed up offsite. Production K8s in each region pulls from its spoke; if spoke is down, fall back to hub. Audit logs centralized via syslog. The design: image authoritatively at the hub, distributed for performance, surviveable for resilience.

Examples

Compose-based small install

bash
# After install.sh completes, you have: $ docker compose ps NAME IMAGE STATUS harbor-core goharbor/harbor-core running harbor-db goharbor/harbor-db running harbor-jobservice goharbor/harbor-jobservice running harbor-portal goharbor/harbor-portal running harbor-registry goharbor/registry-photon running harbor-trivy-adapter goharbor/trivy-adapter running nginx goharbor/nginx-photon running redis goharbor/redis-photon running # ... ~10 services $ curl -k https://harbor.example.com/api/v2.0/health {"status":"healthy","components":[...]}

CI integration

yaml
# GitHub Actions - uses: docker/login-action@v3 with: registry: harbor.example.com username: ci-robot password: ${{ secrets.HARBOR_TOKEN }} - uses: docker/build-push-action@v5 with: push: true tags: harbor.example.com/myproject/api:${{ github.sha }}

Harbor's robot accounts (long-lived API tokens) are the standard CI auth pattern.

Replication: Docker Hub pull-through cache

Harbor settings → Registries Add registry: type=docker-hub, url=https://hub.docker.com Project "library" → Replications Pull-based replication, source=docker-hub, filter=*nginx*

Now docker pull harbor.example.com/library/nginx:1.27 actually pulls from Docker Hub (if not cached), then serves from Harbor on subsequent pulls. Rate limits from Docker Hub are per-IP — your Harbor is one IP, much harder to hit limits.

Short Answer

Interview ready
Premium

A concise answer to help you respond confidently on this topic during an interview.

Comments

No comments yet