Suggest an editImprove this articleRefine the answer for “How to set up a private Docker registry with Harbor?”. Your changes go to moderation before they’re published.Approval requiredContentWhat you’re changing🇺🇸EN🇺🇦UAPreviewTitle (EN)Short answer (EN)**Harbor** is a CNCF graduated, open-source private Docker registry. It wraps the OCI Distribution spec with extras: vulnerability scanning, RBAC, replication, image signing, multi-tenant projects. ```bash # Quick setup wget https://github.com/goharbor/harbor/releases/download/v2.11.0/harbor-online-installer-v2.11.0.tgz tar xzf harbor-online-installer-v2.11.0.tgz && cd harbor cp harbor.yml.tmpl harbor.yml # Edit hostname, password, certificates sudo ./install.sh --with-trivy ``` **Key:** Harbor is the production answer when Docker Hub is not enough. Self-hosted, full RBAC, integrated CVE scanning, replication to other registries, Helm chart support. Alternative to ECR/GCR/GHCR for on-prem or air-gapped.Shown above the full answer for quick recall.Answer (EN)Image**Harbor** is the dominant open-source private container registry. It wraps the basic OCI Distribution spec (the same protocol Docker Hub uses) with the production features that on-prem and enterprise teams actually need. ## Theory ### TL;DR - **Open-source registry** built on top of CNCF distribution. CNCF graduated since 2020. - **Beyond plain registry:** RBAC, multi-tenant projects, vulnerability scanning (Trivy/Clair), image signing (Notary, Cosign), replication to other registries, retention policies, garbage collection, OIDC/LDAP auth. - **Deploy via Compose** for single-node or **Helm chart** for HA on Kubernetes. - Used as on-prem alternative to AWS ECR / Google GCR / Docker Hub when air-gapped, multi-region, or compliance-driven. - Speaks plain `docker pull/push` — your existing CI/CD does not change. ### Architecture ``` +----------+ +----------+ +-----------+ | nginx / | | core | | registry | | portal | -> | (API) | -> | (CNCF | | | | + jobsvc | | distrib.) | +----------+ +----------+ +-----------+ | | | v v v +-----+ +--------+ +-------+ | UI | | trivy | | redis | +-----+ | scanner| +-------+ +--------+ | v +-----------+ | postgres | +-----------+ ``` A dozen containers running together (web UI, API, DB, Redis, registry, scanner, etc.). The Compose installer brings it all up. ### Single-node install (Compose) ```bash # Download wget https://github.com/goharbor/harbor/releases/download/v2.11.0/harbor-online-installer-v2.11.0.tgz tar xzf harbor-online-installer-v2.11.0.tgz cd harbor # Configure cp harbor.yml.tmpl harbor.yml vi harbor.yml # hostname: harbor.example.com # https.port: 443 # https.certificate: /etc/cert/fullchain.pem # https.private_key: /etc/cert/privkey.pem # harbor_admin_password: <strong-pw> # database.password: <db-pw> # Install with Trivy scanner enabled sudo ./install.sh --with-trivy # Brings up: ~10 containers via docker compose ``` Open `https://harbor.example.com` → log in as `admin` with the password from `harbor.yml`. ### Pushing/pulling ```bash # Log in docker login harbor.example.com -u admin -p <password> # Tag and push docker tag myapp:1.0 harbor.example.com/myproject/myapp:1.0 docker push harbor.example.com/myproject/myapp:1.0 # Pull from another host docker pull harbor.example.com/myproject/myapp:1.0 ``` Harbor speaks the same protocol as Docker Hub. CI/CD scripts only change the registry hostname. ### Projects and RBAC Harbor groups images into **projects**. Each project is a namespace with its own: - Visibility: private (auth required) or public (no auth for pulls). - Member roles: Project Admin, Maintainer, Developer, Guest. - Vulnerability scan policies. - Retention policies. - Replication rules. Typical multi-tenant setup: one Harbor instance, one project per team. Project Admins manage their team's images; cross-project access is controlled. ### Vulnerability scanning With `--with-trivy`, Harbor includes a built-in scanner: - Scan on push (configurable per project). - Scheduled re-scans of all images. - CVE results visible in UI; gate pulls by severity ("prevent vulnerable images"). - Tags marked unsafe block deploys via admission policy. ```yaml # In project settings Vulnerability Scanning: enabled Prevent vulnerable images: HIGH and above Auto-scan on push: true ``` ### Replication Harbor can mirror images **to and from** other registries: - Pull-through cache: configure Docker Hub as a remote, Harbor caches pulls. Speeds up downloads, survives Docker Hub rate limits. - Push to another Harbor: multi-region setups replicate prod images to each region's Harbor. - Push to Docker Hub / ECR / GCR: distribute to multiple registries from one source. ```yaml # Replication rule Name: replicate-to-eu Mode: pull / push / event-based Filter: project=prod, tag=v* Destination: harbor-eu.example.com Trigger: scheduled / manual / on-push ``` ### Image signing Two options: - **Notary v1** (built-in, DCT-compatible) — signs image manifests. - **Cosign / Sigstore** (modern) — signs OCI artifacts. Increasingly the default. Enforce "only signed images deploy" via admission control (Kyverno, sigstore-policy-controller). ### Retention policies Without retention, every PR build accumulates forever. Harbor lets you set rules: ``` Retain: most recently pushed 10 tags Exclude: tags matching v* (keep all release tags) Apply to: project=staging ``` Garbage collection runs in a scheduled job; freed space is reclaimable. ### HA install (Helm) For production, run Harbor on Kubernetes: ```bash helm repo add harbor https://helm.goharbor.io helm upgrade --install harbor harbor/harbor \ --set expose.type=ingress \ --set externalURL=https://harbor.example.com \ --set persistence.enabled=true \ --set persistence.persistentVolumeClaim.registry.size=500Gi ``` Replicates services across nodes; persistent volumes for the registry storage and Postgres. Survives node failure if backed by network storage. ### Common mistakes **Forgetting to set up TLS** Harbor `harbor.yml` defaults assume HTTP for testing. Production requires HTTPS — Docker daemon refuses to push to untrusted HTTP registries by default. Generate certs (Let's Encrypt, internal CA) and configure `https:` block in `harbor.yml`. **Running on the same host as critical workloads** A registry that hosts your production images is itself critical infrastructure. If it crashes, you cannot deploy. Run it on dedicated hardware or its own K8s cluster. **No retention policy → disk fills up** ``` 2 TB of CI builds accumulated. Backup window: 12 hours. Recovery window: nervous. ``` Set retention policies from day one. Garbage collection scheduled weekly minimum. **Missing storage backups** Harbor's Postgres has metadata; the registry's filesystem has the actual blobs. Both need backup. Losing metadata = images orphaned. Losing blobs = metadata pointing at nothing. **Deploying without scanning** If you have Harbor with Trivy and do not enable scanning, you are paying for the feature and not using it. At minimum: scan-on-push for all projects. ### Real-world usage - **On-prem enterprise:** finance, healthcare, telecom — Harbor is the default for self-hosted regulated environments. - **Multi-cloud:** Harbor as the central registry, with replication to AWS ECR / GCR for region-local pulls. - **Air-gapped:** classified networks where the public internet is forbidden; Harbor + Trivy DB updates via offline sync. - **Open-source projects:** some maintain their own Harbor for community-built images. - **Pull-through cache:** Harbor cached Docker Hub for thousands of CI runs without hitting rate limits. ### Follow-up questions **Q:** What is the difference between Harbor and the bare `registry:2` image? **A:** `registry:2` is the basic CNCF distribution server — pull/push, that is it. Harbor wraps it with auth, RBAC, UI, scanning, signing, replication. For toy use or quick local testing, `registry:2`. For real production, Harbor. **Q:** Can Harbor scan images from external registries? **A:** Only images stored in Harbor. To scan external images, set up replication to pull them into Harbor first. **Q:** How does Harbor handle storage? **A:** Filesystem (default), S3-compatible (AWS S3, Minio, GCS, Azure Blob), or Swift. Configure via `harbor.yml` storage section. **Q:** Can I run Harbor in HA? **A:** Yes — Helm chart with multiple replicas, external Postgres (HA), external Redis (HA), shared storage backend (S3 or NFS). Single-node Compose install is for small/staging; production = HA. **Q:** (Senior) How would you architect Harbor for a global multi-region deploy? **A:** Central "hub" Harbor where CI pushes; "spoke" Harbors in each region for low-latency pulls. Hub-to-spoke replication on event (push triggers replica push within minutes). Spoke storage uses cloud-region-local object store (S3, GCS). Hub backed up offsite. Production K8s in each region pulls from its spoke; if spoke is down, fall back to hub. Audit logs centralized via syslog. The design: image authoritatively at the hub, distributed for performance, surviveable for resilience. ## Examples ### Compose-based small install ```bash # After install.sh completes, you have: $ docker compose ps NAME IMAGE STATUS harbor-core goharbor/harbor-core running harbor-db goharbor/harbor-db running harbor-jobservice goharbor/harbor-jobservice running harbor-portal goharbor/harbor-portal running harbor-registry goharbor/registry-photon running harbor-trivy-adapter goharbor/trivy-adapter running nginx goharbor/nginx-photon running redis goharbor/redis-photon running # ... ~10 services $ curl -k https://harbor.example.com/api/v2.0/health {"status":"healthy","components":[...]} ``` ### CI integration ```yaml # GitHub Actions - uses: docker/login-action@v3 with: registry: harbor.example.com username: ci-robot password: ${{ secrets.HARBOR_TOKEN }} - uses: docker/build-push-action@v5 with: push: true tags: harbor.example.com/myproject/api:${{ github.sha }} ``` Harbor's robot accounts (long-lived API tokens) are the standard CI auth pattern. ### Replication: Docker Hub pull-through cache ``` Harbor settings → Registries Add registry: type=docker-hub, url=https://hub.docker.com Project "library" → Replications Pull-based replication, source=docker-hub, filter=*nginx* ``` Now `docker pull harbor.example.com/library/nginx:1.27` actually pulls from Docker Hub (if not cached), then serves from Harbor on subsequent pulls. Rate limits from Docker Hub are per-IP — your Harbor is one IP, much harder to hit limits.For the reviewerNote to the moderator (optional)Visible only to the moderator. Helps review go faster.