Architecture¶
This page explains how the operator works internally so you can reason about its behavior when things go wrong and decide which CRD fits which job.
If you want to use the operator, start with the Quick Start. Come back here when you need to understand why.
High-level picture¶
flowchart LR
You[You apply a<br/>NextcloudInstance] --> Op[Nextcloud Operator<br/>builds Helm values,<br/>secrets, DB & bucket]
Op -->|HelmRelease| Flux[Flux CD<br/>installs the chart]
Flux --> NC[Nextcloud<br/>pods · service ·<br/>ingress · storage]
style Op fill:#6366f1,stroke:#4f46e5,color:#fff
style Flux fill:#16a34a,stroke:#15803d,color:#fff
The operator does not install Nextcloud itself — it generates Helm values and hands them to Flux, which then installs the upstream Nextcloud Helm chart. This separation of concerns means you can use any upstream chart features via spec.helm.values as an escape hatch.
The handler model¶
The operator is built on kopf and uses three categories of handlers:
| Handler type | Purpose | Example |
|---|---|---|
@kopf.on.create / @kopf.on.update / @kopf.on.delete |
CRUD lifecycle | Create secrets and HelmRelease when NextcloudInstance is created |
@kopf.on.field |
React to specific spec changes or annotations | k8s.bnerd.com/reconcile annotation triggers a full reconcile |
@kopf.on.timer |
Periodic reconciliation | Every 30s: check HelmRelease status and sync it to NextcloudInstance.status |
All handlers live in operator/handlers/ and delegate actual work to utility modules in operator/utils/. This keeps handlers thin and makes the business logic testable.
NextcloudInstance create flow¶
When you kubectl apply a NextcloudInstance, the operator runs roughly this sequence:
sequenceDiagram
participant U as User
participant K as K8s API
participant O as Operator
participant P as Percona PG<br/>Operator
participant F as Flux<br/>HelmController
U->>K: kubectl apply NextcloudInstance
K->>O: @kopf.on.create
O->>O: Validate spec
O->>K: status.phase = Pending
O->>K: Create secrets (db, admin, redis, s3, mail)
alt database.managed = true
O->>P: Create PerconaPGCluster
P-->>O: (wait, up to 20min)
O->>K: Read PG secret, create NC db secret
end
alt s3 auto-create needed
O->>O: boto3 create_bucket()
end
O->>O: Resolve spec.version → chart version
O->>O: Build Helm values (profile → spec → overrides)
O->>K: Create HelmRepository + HelmRelease
O->>K: status.phase = Ready (waiting for HR)
F->>K: Installs Nextcloud chart → Pods, Svc, Ingress
Note over O: Timer (every 30s) syncs<br/>HelmRelease readiness → NCI status
Key details:
- Secrets are created first — the HelmRelease references them by name, so they must exist before the chart install.
- Managed databases use a nested controller — the operator waits for
PerconaPGClusterto go ready (max 20 min), then extracts the connection info into a Nextcloud-format database secret. If the PG operator isn't installed, the handler raises aPermanentErrorwith a clear message. - Version resolution is explicit —
status.versionResolutionrecords which chart version was chosen and why. See Version Management. - 4-layer value cascade —
Final values = built-in profile → custom NextcloudProfile CRD → instance spec → spec.helm.values. Last wins. See Configuration Profiles.
Update flow¶
Updates are diff-based:
- Compare
old.specvs.new.specfordatabase,redis,s3,mail,admin. - Only re-create the secrets whose source fields actually changed.
- Rebuild Helm values and patch the
HelmRelease. - Flux notices the HelmRelease changed and reconciles the chart.
This avoids unnecessary pod restarts when unrelated fields change.
Delete flow¶
@kopf.on.deletefires.- Delete
HelmReleaseandHelmRepository. Flux uninstalls the chart (pods, services, ingress). - Delete operator-managed secrets.
- PVCs are retained by default — deliberate, so that accidental deletion doesn't lose data. Delete the namespace to remove everything.
- Finalizer completes; the CRD is removed.
Deletion errors are logged as warnings, not failures — the handler does not block deletion on cleanup issues. Use the k8s.bnerd.com/force-delete annotation to bypass remaining cleanup if needed.
Status state machine¶
stateDiagram-v2
[*] --> Pending: created
Pending --> Creating: validation passed
Creating --> Ready: HelmRelease ready
Creating --> Failed: PermanentError
Ready --> Updating: spec changed
Updating --> Ready: update complete
Updating --> Failed: update failed
Failed --> Creating: retry (TemporaryError) or fix
Ready --> [*]: deleted
Status transitions are driven by:
- Handler return values on create/update/delete
- The 30-second timer that syncs downstream
HelmReleasereadiness intoNextcloudInstance.status
Error model¶
Every handler uses one of two kopf exceptions:
| Error type | Meaning | Retries? |
|---|---|---|
kopf.TemporaryError(msg, delay=30) |
Transient — API down, managed DB not ready yet, network blip | Yes, after delay |
kopf.PermanentError(msg) |
Structural — invalid spec, missing required dep, unknown version | No — status goes Failed |
When you see a log line with TemporaryError, the operator will retry automatically. When you see PermanentError, it will not self-heal — you must fix the underlying cause and either update the spec or annotate with k8s.bnerd.com/reconcile to force a fresh attempt.
The 4-CRD cascade¶
flowchart LR
Profile[NextcloudProfile<br/>cluster-scoped<br/>defaults]
Pool[NextcloudPool<br/>cluster-scoped<br/>template.spec]
NC[Nextcloud<br/>namespaced<br/>tenant spec]
NCI[NextcloudInstance<br/>namespaced<br/>physical runtime]
Profile -->|provides defaults| Pool
Profile -->|provides defaults| NCI
Pool -->|template for| NCI
NC -->|assigned from pool<br/>or creates directly| NCI
style NCI fill:#6366f1,stroke:#4f46e5,color:#fff
NextcloudInstanceis always what actually runs. Every other CRD exists to produce or configure one.NextcloudProfileprovides reusable defaults (production,testing,development, or your own).NextcloudPoolpre-warms a set of unassignedNextcloudInstanceresources for fast tenant onboarding.Nextcloudis the tenant-facing façade. It either assigns an existing pool instance or creates a fresh one.
For the decision tree of which CRD to use when, see CRD Mental Model.
Secret naming convention¶
Given a NextcloudInstance named my-nextcloud, the operator creates:
| Secret | Condition |
|---|---|
my-nextcloud-nextcloud-db |
Always |
my-nextcloud-nextcloud-admin |
Always |
my-nextcloud-nextcloud-redis |
If spec.redis.enabled |
my-nextcloud-nextcloud-s3 |
If spec.s3.enabled |
my-nextcloud-nextcloud-mail |
If spec.mail.enabled |
my-nextcloud-nextcloud-s3backup |
If spec.backups.data.enabled |
my-nextcloud-nextcloud-recording |
If spec.spreed.recording.enabled |
Plus:
HelmRelease:my-nextcloud-nextcloudHelmRepository:my-nextcloud-nextcloud-repo
When you reference external secrets via credentialsSecret, the operator reads them, copies the credentials into its own secret (with the naming above), and the HelmRelease references that. See Secret Management.
Integration points¶
The operator talks to several external systems. Each is optional except where noted:
| System | Use | Required? |
|---|---|---|
| Kubernetes API | CRDs, Secrets, Namespaces | Yes |
| Flux CD v2 | HelmRelease, HelmRepository |
Yes |
| Percona PG Operator | Managed PostgreSQL via PerconaPGCluster |
Only for database.managed: true |
| S3 API (boto3) | Auto-create primary-storage bucket | Only for s3.enabled with auto-create |
| bnerd backup operator | Data backup via S3Backup CRD |
Only for backups.data.enabled |
| HPB Signaling API | Backend registration for Talk HPB | Only when SignalingServer CRD is used |
| Recording Backend API | Backend registration for Talk recording | Only when RecordingServer CRD is used |
| Upstream Helm repo | Chart download (via Flux) | Yes |
Further reading¶
- Pool Design — How pool assignment works end-to-end
- CRD Mental Model — Decision tree for which CRD to use
- Troubleshooting — What to do when things go wrong
- Managed PostgreSQL — Integration with Percona PG Operator
- Version Management — Version resolution details