Skip to content

Architecture

This page explains how the operator works internally so you can reason about its behavior when things go wrong and decide which CRD fits which job.

If you want to use the operator, start with the Quick Start. Come back here when you need to understand why.

High-level picture

flowchart LR
    You[You apply a<br/>NextcloudInstance] --> Op[Nextcloud Operator<br/>builds Helm values,<br/>secrets, DB &amp; bucket]
    Op -->|HelmRelease| Flux[Flux CD<br/>installs the chart]
    Flux --> NC[Nextcloud<br/>pods · service ·<br/>ingress · storage]
    style Op fill:#6366f1,stroke:#4f46e5,color:#fff
    style Flux fill:#16a34a,stroke:#15803d,color:#fff

The operator does not install Nextcloud itself — it generates Helm values and hands them to Flux, which then installs the upstream Nextcloud Helm chart. This separation of concerns means you can use any upstream chart features via spec.helm.values as an escape hatch.

The handler model

The operator is built on kopf and uses three categories of handlers:

Handler type Purpose Example
@kopf.on.create / @kopf.on.update / @kopf.on.delete CRUD lifecycle Create secrets and HelmRelease when NextcloudInstance is created
@kopf.on.field React to specific spec changes or annotations k8s.bnerd.com/reconcile annotation triggers a full reconcile
@kopf.on.timer Periodic reconciliation Every 30s: check HelmRelease status and sync it to NextcloudInstance.status

All handlers live in operator/handlers/ and delegate actual work to utility modules in operator/utils/. This keeps handlers thin and makes the business logic testable.

NextcloudInstance create flow

When you kubectl apply a NextcloudInstance, the operator runs roughly this sequence:

sequenceDiagram
    participant U as User
    participant K as K8s API
    participant O as Operator
    participant P as Percona PG<br/>Operator
    participant F as Flux<br/>HelmController

    U->>K: kubectl apply NextcloudInstance
    K->>O: @kopf.on.create
    O->>O: Validate spec
    O->>K: status.phase = Pending
    O->>K: Create secrets (db, admin, redis, s3, mail)
    alt database.managed = true
        O->>P: Create PerconaPGCluster
        P-->>O: (wait, up to 20min)
        O->>K: Read PG secret, create NC db secret
    end
    alt s3 auto-create needed
        O->>O: boto3 create_bucket()
    end
    O->>O: Resolve spec.version → chart version
    O->>O: Build Helm values (profile → spec → overrides)
    O->>K: Create HelmRepository + HelmRelease
    O->>K: status.phase = Ready (waiting for HR)
    F->>K: Installs Nextcloud chart → Pods, Svc, Ingress
    Note over O: Timer (every 30s) syncs<br/>HelmRelease readiness → NCI status

Key details:

  • Secrets are created first — the HelmRelease references them by name, so they must exist before the chart install.
  • Managed databases use a nested controller — the operator waits for PerconaPGCluster to go ready (max 20 min), then extracts the connection info into a Nextcloud-format database secret. If the PG operator isn't installed, the handler raises a PermanentError with a clear message.
  • Version resolution is explicitstatus.versionResolution records which chart version was chosen and why. See Version Management.
  • 4-layer value cascadeFinal values = built-in profile → custom NextcloudProfile CRD → instance spec → spec.helm.values. Last wins. See Configuration Profiles.

Update flow

Updates are diff-based:

  1. Compare old.spec vs. new.spec for database, redis, s3, mail, admin.
  2. Only re-create the secrets whose source fields actually changed.
  3. Rebuild Helm values and patch the HelmRelease.
  4. Flux notices the HelmRelease changed and reconciles the chart.

This avoids unnecessary pod restarts when unrelated fields change.

Delete flow

  1. @kopf.on.delete fires.
  2. Delete HelmRelease and HelmRepository. Flux uninstalls the chart (pods, services, ingress).
  3. Delete operator-managed secrets.
  4. PVCs are retained by default — deliberate, so that accidental deletion doesn't lose data. Delete the namespace to remove everything.
  5. Finalizer completes; the CRD is removed.

Deletion errors are logged as warnings, not failures — the handler does not block deletion on cleanup issues. Use the k8s.bnerd.com/force-delete annotation to bypass remaining cleanup if needed.

Status state machine

stateDiagram-v2
    [*] --> Pending: created
    Pending --> Creating: validation passed
    Creating --> Ready: HelmRelease ready
    Creating --> Failed: PermanentError
    Ready --> Updating: spec changed
    Updating --> Ready: update complete
    Updating --> Failed: update failed
    Failed --> Creating: retry (TemporaryError) or fix
    Ready --> [*]: deleted

Status transitions are driven by:

  • Handler return values on create/update/delete
  • The 30-second timer that syncs downstream HelmRelease readiness into NextcloudInstance.status

Error model

Every handler uses one of two kopf exceptions:

Error type Meaning Retries?
kopf.TemporaryError(msg, delay=30) Transient — API down, managed DB not ready yet, network blip Yes, after delay
kopf.PermanentError(msg) Structural — invalid spec, missing required dep, unknown version No — status goes Failed

When you see a log line with TemporaryError, the operator will retry automatically. When you see PermanentError, it will not self-heal — you must fix the underlying cause and either update the spec or annotate with k8s.bnerd.com/reconcile to force a fresh attempt.

The 4-CRD cascade

flowchart LR
    Profile[NextcloudProfile<br/>cluster-scoped<br/>defaults]
    Pool[NextcloudPool<br/>cluster-scoped<br/>template.spec]
    NC[Nextcloud<br/>namespaced<br/>tenant spec]
    NCI[NextcloudInstance<br/>namespaced<br/>physical runtime]

    Profile -->|provides defaults| Pool
    Profile -->|provides defaults| NCI
    Pool -->|template for| NCI
    NC -->|assigned from pool<br/>or creates directly| NCI

    style NCI fill:#6366f1,stroke:#4f46e5,color:#fff
  • NextcloudInstance is always what actually runs. Every other CRD exists to produce or configure one.
  • NextcloudProfile provides reusable defaults (production, testing, development, or your own).
  • NextcloudPool pre-warms a set of unassigned NextcloudInstance resources for fast tenant onboarding.
  • Nextcloud is the tenant-facing façade. It either assigns an existing pool instance or creates a fresh one.

For the decision tree of which CRD to use when, see CRD Mental Model.

Secret naming convention

Given a NextcloudInstance named my-nextcloud, the operator creates:

Secret Condition
my-nextcloud-nextcloud-db Always
my-nextcloud-nextcloud-admin Always
my-nextcloud-nextcloud-redis If spec.redis.enabled
my-nextcloud-nextcloud-s3 If spec.s3.enabled
my-nextcloud-nextcloud-mail If spec.mail.enabled
my-nextcloud-nextcloud-s3backup If spec.backups.data.enabled
my-nextcloud-nextcloud-recording If spec.spreed.recording.enabled

Plus:

  • HelmRelease: my-nextcloud-nextcloud
  • HelmRepository: my-nextcloud-nextcloud-repo

When you reference external secrets via credentialsSecret, the operator reads them, copies the credentials into its own secret (with the naming above), and the HelmRelease references that. See Secret Management.

Integration points

The operator talks to several external systems. Each is optional except where noted:

System Use Required?
Kubernetes API CRDs, Secrets, Namespaces Yes
Flux CD v2 HelmRelease, HelmRepository Yes
Percona PG Operator Managed PostgreSQL via PerconaPGCluster Only for database.managed: true
S3 API (boto3) Auto-create primary-storage bucket Only for s3.enabled with auto-create
bnerd backup operator Data backup via S3Backup CRD Only for backups.data.enabled
HPB Signaling API Backend registration for Talk HPB Only when SignalingServer CRD is used
Recording Backend API Backend registration for Talk recording Only when RecordingServer CRD is used
Upstream Helm repo Chart download (via Flux) Yes

Further reading