Pool Design¶
This page explains why pools exist and how assignment works internally. For usage and YAML, see Pool Provisioning.
Why pools?¶
Creating a Nextcloud from scratch takes roughly 2 minutes: the managed PostgreSQL cluster alone needs ~90s to become ready, then the Helm release installs the chart, which pulls images and starts pods. For a single production deployment that's fine. For a SaaS product that onboards tenants on demand, it's painfully slow.
The answer is a pool: a set of pre-provisioned NextcloudInstance resources sitting unassigned, waiting for a tenant to claim one. Assignment is a label + spec update, so it completes in roughly 30 seconds instead of 2 minutes.
Component overview¶
flowchart TB
subgraph cluster[Cluster scope]
Pool1[NextcloudPool: production<br/>replicas: 5]
Pool2[NextcloudPool: development<br/>replicas: 2]
end
subgraph tenantNS[Tenant Namespace — tenant-acme]
NC1[Nextcloud: main-nc<br/>poolSelector: production]
NC2[Nextcloud: test-nc<br/>poolSelector: development]
end
subgraph instNS[Instance Namespace — nextcloud-instances-acme]
NCI1[NextcloudInstance<br/>nc-happy-sun-a1b2c3<br/>assigned: true → main-nc]
NCI2[NextcloudInstance<br/>nc-calm-moon-d4e5f6<br/>assigned: false<br/>pool: production]
NCI3[NextcloudInstance<br/>nc-brave-lake-g7h8i9<br/>assigned: false<br/>pool: production]
end
Pool1 -.maintains.-> NCI2
Pool1 -.maintains.-> NCI3
Pool1 -.replenishes.-> NCI1
NC1 -->|assigned from| NCI1
NC2 -.not yet assigned.-> NCI2
style NCI1 fill:#6366f1,stroke:#4f46e5,color:#fff
Three key roles:
NextcloudPool(cluster-scoped) — keeps a target number of unassignedNextcloudInstanceresources ready.Nextcloud(namespaced, tenant-facing) — "give me an instance"; holds the ingress, tenant-specific config, and a pointer to the assignedNextcloudInstance.NextcloudInstance(namespaced) — the actual running Nextcloud. May be labeledassigned: "false"(pool member) orassigned: "true"with atenantandnextcloudlabel.
Assignment flow¶
sequenceDiagram
participant U as Tenant
participant NC as Nextcloud (logical)
participant OP as Operator
participant NCI as NextcloudInstance
participant Pool as NextcloudPool
U->>NC: kubectl apply (with poolSelector)
NC->>OP: @kopf.on.create Nextcloud
OP->>OP: Search for NextcloudInstance<br/>with matching labels<br/>and assigned=false
OP->>NCI: Patch labels (assigned=true,<br/>tenant=acme, nextcloud=main-nc)
OP->>NCI: Patch spec (copy from Nextcloud.spec)
OP->>NCI: Set ownerReference → Nextcloud
OP->>NC: status.instanceRef = {namespace, name}
NCI->>NCI: Reconciles with new spec<br/>(updates HelmRelease)
Pool->>Pool: Timer detects deficit<br/>(unassigned=4, target=5)
Pool->>Pool: Creates replacement instance
Key properties:
- Cross-namespace reference — the
Nextcloudlives in the tenant namespace; the assignedNextcloudInstancelives in an instance namespace (usually per-pool). TheownerReferencecrosses that boundary for cascading delete. - Spec propagation — anything set on
Nextcloud.spec(ingress host, admin credentials, custom apps) is copied toNextcloudInstance.spec. The instance's existing settings from the pool template are overwritten. - Labels are the selector —
spec.poolSelector.matchLabelsonNextcloudmust matchtemplate.metadata.labelsonNextcloudPool. If there is no match, the operator falls back to fresh creation (see below).
Fresh creation flow (no pool)¶
If spec.poolSelector is empty or no matching instance exists:
- Operator generates a random name (
nc-{adjective}-{noun}-{random}). - Creates a new
NextcloudInstancein the configured instance namespace. - Copies the entire
Nextcloud.specto the new instance. - Sets
ownerReferenceto theNextcloud. - The new instance reconciles from scratch — roughly 2 minutes on first run.
This is the same path as manually creating a NextcloudInstance, just initiated from a Nextcloud instead of direct kubectl apply.
Drift reconciliation¶
What if someone manually edits a NextcloudInstance that's assigned to a Nextcloud?
- The
Nextcloudhandler runs a timer every 30s. - It compares
NextcloudInstance.specwithNextcloud.spec. - If they diverge, it overwrites the instance spec with the tenant spec and logs a warning.
In other words: the Nextcloud resource is the source of truth for tenant configuration. Manual edits on assigned instances are not preserved.
Status flow is the opposite direction — phase, helmRelease, url, version, and conditions are copied from the instance to the Nextcloud, so tenants see a single source of truth for status.
Pool replenishment¶
Every 60 seconds, NextcloudPool handlers compare desired vs. actual:
unassigned_count = count(NextcloudInstance where pool=X and assigned=false)
deficit = spec.replicas - unassigned_count
if deficit > 0:
create up to MAX_INSTANCES_PER_CYCLE (10) new instances
if deficit < 0:
delete oldest unassigned instances
The MAX_INSTANCES_PER_CYCLE = 10 cap prevents a large scale-up from overwhelming the API server or managed-DB operator. Large pools converge over several reconcile cycles, not in one burst.
Modes, validation, and the safe-fallback contract¶
A Nextcloud CR can be in one of four modes, determined by what its spec provides. The operator validates the references the user gave it and then dispatches to a single mode — there is no cross-mode fallback.
| Request shape | Mode | Success outcome |
|---|---|---|
spec.instanceRef set |
Reference mode | Bind to the named instance. |
spec.poolSelector set |
Pool mode | Assign from a matching idle instance. |
spec.profile set, no poolSelector |
Profile mode | Spawn a fresh instance with the profile's defaults deep-merged into the spec. |
| Nothing set | Vanilla mode | Spawn a bare instance (SQLite, no S3 unless spec provides them). |
If anything the user referenced can't be honoured, the operator blocks instead of silently spawning a misconfigured instance:
| Situation | Condition | Behaviour |
|---|---|---|
spec.profile is set but doesn't resolve |
InstanceAssigned=False reason=ProfileNotFound |
Warning event; TemporaryError(delay=60) retry. Checked before mode dispatch. |
| Pool mode, no match | InstanceAssigned=False reason=PoolExhausted |
Warning event; TemporaryError(delay=60) retry. No profile-backed fallback. |
spec.profile is honoured differently per mode:
- Pool mode: audit-only — recorded in
status.appliedProfile, the profile's defaults are not merged into the assigned instance (the pool template is the source of truth there). - Profile mode: defaults are merged into the spec passed to the freshly spawned
NextcloudInstance, so the result is fully configured.
The vanilla path is reserved for the explicit no-poolSelector + no-profile case. A Normal BareFreshCreation event records that the spawned instance will use SQLite / no S3 unless spec provides them.
Recovery from PoolExhausted¶
# Surface the blocked Nextcloud CRs
kubectl get nc -A -o jsonpath='{range .items[?(@.status.conditions[?(@.reason=="PoolExhausted")])]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}'
# Refill the pool
kubectl patch ncp production --type=merge -p '{"spec":{"replicas":10}}'
# Or correct the selector on a specific CR
kubectl patch nc my-nc -n tenant-a --type=merge -p '{"spec":{"poolSelector":{"matchLabels":{"pool":"production"}}}}'
Recovery from ProfileNotFound¶
# Apply the missing profile, or
kubectl apply -f profile-production.yaml
# fix a typo on the Nextcloud CR
kubectl patch nc my-nc -n tenant-a --type=merge -p '{"spec":{"profile":"production"}}'
Lifecycle policies¶
NextcloudPool.spec.lifecycle controls what happens to instances over time:
| Policy | Default | Meaning |
|---|---|---|
recreateOnProfileChange |
false |
When the pool's profile changes, delete and recreate unassigned instances so they pick up the new defaults. Assigned instances are untouched. See Profile changes and instance recreation below. |
maxUnassignedAge |
168h |
After this age, unassigned instances are deleted and replaced. Prevents stale images / expired TLS / drift over time. |
reclaimPolicy |
Delete |
When the assigned Nextcloud is deleted: Delete tears down the instance and all its data; Retain preserves the live S3 bucket, managed PostgreSQL, PVCs, and owned namespace for manual reclamation. See Deletion → reclaimPolicy: Retain. |
Profile changes and instance recreation¶
Pools may opt in to automatic recreation of unassigned instances when the referenced profile changes, via spec.lifecycle.recreateOnProfileChange: true.
Two kinds of "profile change" are detected:
spec.profileRef.nameswap — the pool now points at a different profile. Detected immediately on update.- Profile contents edited — same profile name, but its
spec.defaultschanged (e.g. S3 credentials rotated). Detected by the reconcile timer comparing a SHA256 digest of the resolved profile againststatus.observedProfile.digest(typically within 60 seconds).
When a change is detected:
- Unassigned instances are deleted. The reconcile timer recreates them with the current profile on the next cycle (capped at
MAX_INSTANCES_PER_CYCLEper tick). - Assigned instances are never recreated. Their configuration is sticky once a
Nextcloudhas claimed them; profile edits do not propagate retroactively. This avoids surprise reconfiguration of tenants already running on the instance.
The first reconcile after operator upgrade records the digest without deleting anything, so enabling the flag on existing pools is safe.
status.observedProfile (name, digest, observedAt) is always recorded for observability, even when recreateOnProfileChange is false — only the deletion behavior is gated on the flag.
Performance characteristics¶
| Scenario | Time | Why |
|---|---|---|
| Assigning from a ready pool | ~30s | Label + spec patch, then Flux re-reconciles the existing HelmRelease |
| Fresh creation (pool miss or no selector) | ~2–3min | Managed PG provisioning dominates |
| Pool replenishment (after an assignment) | Ongoing, async | New instance created in background, doesn't affect tenant experience |
Naming: nc-{adjective}-{noun}-{random}¶
Pool instances are named with a two-word haiku plus a 6-char random suffix — nc-happy-sun-a1b2c3, nc-brave-lake-g7h8i9. Rationale:
- Memorable in support conversations — much easier to say than UUIDs
- Collision-resistant — 50 adjectives × 50 nouns × 36⁶ ≈ 3 trillion combinations
- Readable in dashboards — fits naturally in kubectl output and Grafana panels
What pools are not for¶
- High availability of a single tenant — use
spec.replicason a single instance instead. - Blue/green deploys — pools pre-provision identical instances, not different versions.
- Running arbitrary Kubernetes workloads — pool instances are Nextcloud instances. For scheduled-job or test-harness use cases, provision your own
NextcloudInstancedirectly.
See also¶
- Pool Provisioning — YAML-level guide and examples
- Architecture — How the underlying
NextcloudInstancereconciles - CRD Mental Model — Decision tree for pool vs. direct vs. profile