Skip to content

Pool Design

This page explains why pools exist and how assignment works internally. For usage and YAML, see Pool Provisioning.

Why pools?

Creating a Nextcloud from scratch takes roughly 2 minutes: the managed PostgreSQL cluster alone needs ~90s to become ready, then the Helm release installs the chart, which pulls images and starts pods. For a single production deployment that's fine. For a SaaS product that onboards tenants on demand, it's painfully slow.

The answer is a pool: a set of pre-provisioned NextcloudInstance resources sitting unassigned, waiting for a tenant to claim one. Assignment is a label + spec update, so it completes in roughly 30 seconds instead of 2 minutes.

Component overview

flowchart TB
    subgraph cluster[Cluster scope]
        Pool1[NextcloudPool: production<br/>replicas: 5]
        Pool2[NextcloudPool: development<br/>replicas: 2]
    end

    subgraph tenantNS[Tenant Namespace — tenant-acme]
        NC1[Nextcloud: main-nc<br/>poolSelector: production]
        NC2[Nextcloud: test-nc<br/>poolSelector: development]
    end

    subgraph instNS[Instance Namespace — nextcloud-instances-acme]
        NCI1[NextcloudInstance<br/>nc-happy-sun-a1b2c3<br/>assigned: true → main-nc]
        NCI2[NextcloudInstance<br/>nc-calm-moon-d4e5f6<br/>assigned: false<br/>pool: production]
        NCI3[NextcloudInstance<br/>nc-brave-lake-g7h8i9<br/>assigned: false<br/>pool: production]
    end

    Pool1 -.maintains.-> NCI2
    Pool1 -.maintains.-> NCI3
    Pool1 -.replenishes.-> NCI1
    NC1 -->|assigned from| NCI1
    NC2 -.not yet assigned.-> NCI2

    style NCI1 fill:#6366f1,stroke:#4f46e5,color:#fff

Three key roles:

  • NextcloudPool (cluster-scoped) — keeps a target number of unassigned NextcloudInstance resources ready.
  • Nextcloud (namespaced, tenant-facing) — "give me an instance"; holds the ingress, tenant-specific config, and a pointer to the assigned NextcloudInstance.
  • NextcloudInstance (namespaced) — the actual running Nextcloud. May be labeled assigned: "false" (pool member) or assigned: "true" with a tenant and nextcloud label.

Assignment flow

sequenceDiagram
    participant U as Tenant
    participant NC as Nextcloud (logical)
    participant OP as Operator
    participant NCI as NextcloudInstance
    participant Pool as NextcloudPool

    U->>NC: kubectl apply (with poolSelector)
    NC->>OP: @kopf.on.create Nextcloud
    OP->>OP: Search for NextcloudInstance<br/>with matching labels<br/>and assigned=false
    OP->>NCI: Patch labels (assigned=true,<br/>tenant=acme, nextcloud=main-nc)
    OP->>NCI: Patch spec (copy from Nextcloud.spec)
    OP->>NCI: Set ownerReference → Nextcloud
    OP->>NC: status.instanceRef = {namespace, name}
    NCI->>NCI: Reconciles with new spec<br/>(updates HelmRelease)
    Pool->>Pool: Timer detects deficit<br/>(unassigned=4, target=5)
    Pool->>Pool: Creates replacement instance

Key properties:

  • Cross-namespace reference — the Nextcloud lives in the tenant namespace; the assigned NextcloudInstance lives in an instance namespace (usually per-pool). The ownerReference crosses that boundary for cascading delete.
  • Spec propagation — anything set on Nextcloud.spec (ingress host, admin credentials, custom apps) is copied to NextcloudInstance.spec. The instance's existing settings from the pool template are overwritten.
  • Labels are the selectorspec.poolSelector.matchLabels on Nextcloud must match template.metadata.labels on NextcloudPool. If there is no match, the operator falls back to fresh creation (see below).

Fresh creation flow (no pool)

If spec.poolSelector is empty or no matching instance exists:

  1. Operator generates a random name (nc-{adjective}-{noun}-{random}).
  2. Creates a new NextcloudInstance in the configured instance namespace.
  3. Copies the entire Nextcloud.spec to the new instance.
  4. Sets ownerReference to the Nextcloud.
  5. The new instance reconciles from scratch — roughly 2 minutes on first run.

This is the same path as manually creating a NextcloudInstance, just initiated from a Nextcloud instead of direct kubectl apply.

Drift reconciliation

What if someone manually edits a NextcloudInstance that's assigned to a Nextcloud?

  • The Nextcloud handler runs a timer every 30s.
  • It compares NextcloudInstance.spec with Nextcloud.spec.
  • If they diverge, it overwrites the instance spec with the tenant spec and logs a warning.

In other words: the Nextcloud resource is the source of truth for tenant configuration. Manual edits on assigned instances are not preserved.

Status flow is the opposite direction — phase, helmRelease, url, version, and conditions are copied from the instance to the Nextcloud, so tenants see a single source of truth for status.

Pool replenishment

Every 60 seconds, NextcloudPool handlers compare desired vs. actual:

unassigned_count = count(NextcloudInstance where pool=X and assigned=false)
deficit = spec.replicas - unassigned_count

if deficit > 0:
    create up to MAX_INSTANCES_PER_CYCLE (10) new instances
if deficit < 0:
    delete oldest unassigned instances

The MAX_INSTANCES_PER_CYCLE = 10 cap prevents a large scale-up from overwhelming the API server or managed-DB operator. Large pools converge over several reconcile cycles, not in one burst.

Modes, validation, and the safe-fallback contract

A Nextcloud CR can be in one of four modes, determined by what its spec provides. The operator validates the references the user gave it and then dispatches to a single mode — there is no cross-mode fallback.

Request shape Mode Success outcome
spec.instanceRef set Reference mode Bind to the named instance.
spec.poolSelector set Pool mode Assign from a matching idle instance.
spec.profile set, no poolSelector Profile mode Spawn a fresh instance with the profile's defaults deep-merged into the spec.
Nothing set Vanilla mode Spawn a bare instance (SQLite, no S3 unless spec provides them).

If anything the user referenced can't be honoured, the operator blocks instead of silently spawning a misconfigured instance:

Situation Condition Behaviour
spec.profile is set but doesn't resolve InstanceAssigned=False reason=ProfileNotFound Warning event; TemporaryError(delay=60) retry. Checked before mode dispatch.
Pool mode, no match InstanceAssigned=False reason=PoolExhausted Warning event; TemporaryError(delay=60) retry. No profile-backed fallback.

spec.profile is honoured differently per mode:

  • Pool mode: audit-only — recorded in status.appliedProfile, the profile's defaults are not merged into the assigned instance (the pool template is the source of truth there).
  • Profile mode: defaults are merged into the spec passed to the freshly spawned NextcloudInstance, so the result is fully configured.

The vanilla path is reserved for the explicit no-poolSelector + no-profile case. A Normal BareFreshCreation event records that the spawned instance will use SQLite / no S3 unless spec provides them.

Recovery from PoolExhausted

# Surface the blocked Nextcloud CRs
kubectl get nc -A -o jsonpath='{range .items[?(@.status.conditions[?(@.reason=="PoolExhausted")])]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}'

# Refill the pool
kubectl patch ncp production --type=merge -p '{"spec":{"replicas":10}}'

# Or correct the selector on a specific CR
kubectl patch nc my-nc -n tenant-a --type=merge -p '{"spec":{"poolSelector":{"matchLabels":{"pool":"production"}}}}'

Recovery from ProfileNotFound

# Apply the missing profile, or
kubectl apply -f profile-production.yaml

# fix a typo on the Nextcloud CR
kubectl patch nc my-nc -n tenant-a --type=merge -p '{"spec":{"profile":"production"}}'

Lifecycle policies

NextcloudPool.spec.lifecycle controls what happens to instances over time:

Policy Default Meaning
recreateOnProfileChange false When the pool's profile changes, delete and recreate unassigned instances so they pick up the new defaults. Assigned instances are untouched. See Profile changes and instance recreation below.
maxUnassignedAge 168h After this age, unassigned instances are deleted and replaced. Prevents stale images / expired TLS / drift over time.
reclaimPolicy Delete When the assigned Nextcloud is deleted: Delete tears down the instance and all its data; Retain preserves the live S3 bucket, managed PostgreSQL, PVCs, and owned namespace for manual reclamation. See Deletion → reclaimPolicy: Retain.

Profile changes and instance recreation

Pools may opt in to automatic recreation of unassigned instances when the referenced profile changes, via spec.lifecycle.recreateOnProfileChange: true.

Two kinds of "profile change" are detected:

  1. spec.profileRef.name swap — the pool now points at a different profile. Detected immediately on update.
  2. Profile contents edited — same profile name, but its spec.defaults changed (e.g. S3 credentials rotated). Detected by the reconcile timer comparing a SHA256 digest of the resolved profile against status.observedProfile.digest (typically within 60 seconds).

When a change is detected:

  • Unassigned instances are deleted. The reconcile timer recreates them with the current profile on the next cycle (capped at MAX_INSTANCES_PER_CYCLE per tick).
  • Assigned instances are never recreated. Their configuration is sticky once a Nextcloud has claimed them; profile edits do not propagate retroactively. This avoids surprise reconfiguration of tenants already running on the instance.

The first reconcile after operator upgrade records the digest without deleting anything, so enabling the flag on existing pools is safe.

status.observedProfile (name, digest, observedAt) is always recorded for observability, even when recreateOnProfileChange is false — only the deletion behavior is gated on the flag.

Performance characteristics

Scenario Time Why
Assigning from a ready pool ~30s Label + spec patch, then Flux re-reconciles the existing HelmRelease
Fresh creation (pool miss or no selector) ~2–3min Managed PG provisioning dominates
Pool replenishment (after an assignment) Ongoing, async New instance created in background, doesn't affect tenant experience

Naming: nc-{adjective}-{noun}-{random}

Pool instances are named with a two-word haiku plus a 6-char random suffix — nc-happy-sun-a1b2c3, nc-brave-lake-g7h8i9. Rationale:

  • Memorable in support conversations — much easier to say than UUIDs
  • Collision-resistant — 50 adjectives × 50 nouns × 36⁶ ≈ 3 trillion combinations
  • Readable in dashboards — fits naturally in kubectl output and Grafana panels

What pools are not for

  • High availability of a single tenant — use spec.replicas on a single instance instead.
  • Blue/green deploys — pools pre-provision identical instances, not different versions.
  • Running arbitrary Kubernetes workloads — pool instances are Nextcloud instances. For scheduled-job or test-harness use cases, provision your own NextcloudInstance directly.

See also