Running occ Commands (NextcloudCommand)¶
Sometimes you need to run a one-off occ command against a running Nextcloud instance — install an app, toggle a setting, kick off a repair. Historically that meant kubectl exec into the pod or asking an operator maintainer to bake a new canned flow into the codebase.
The NextcloudCommand CRD exposes the operator's internal occ runner as a kubectl-native API:
- Declarative — apply a YAML, the operator runs the commands.
- Auditable — per-command exit codes, stdout/stderr snippets, and timestamps land in
.status. - Self-cleaning — objects are garbage-collected after
spec.ttlSecondsAfterFinished(default 7 days; set0to keep forever). - Safe argv — commands are argv arrays, not shell strings; there is no shell interpolation in the target pod.
Quick start¶
apiVersion: k8s.bnerd.com/v1alpha1
kind: NextcloudCommand
metadata:
name: occ-status
namespace: default
spec:
targetRef:
kind: NextcloudInstance
name: my-nextcloud
commands:
- ["status", "--output=json"]
kubectl apply -f command-occ-status.yaml
kubectl get nccmd -w # watch Pending → Running → Succeeded
kubectl get nccmd occ-status -o jsonpath='{.status.results[0].stdoutSnippet}'
Multi-step example¶
apiVersion: k8s.bnerd.com/v1alpha1
kind: NextcloudCommand
metadata:
name: enable-richdocuments
namespace: default
spec:
targetRef:
kind: NextcloudInstance
name: my-nextcloud
commands:
- ["app:install", "richdocuments"]
- ["app:enable", "richdocuments"]
- ["config:app:set", "richdocuments", "wopi_url", "--value=https://office.example.com"]
timeoutSeconds: 600
perCommandTimeoutSeconds: 180
haltOnError: true
ttlSecondsAfterFinished: 604800 # keep for 7 days (this is the default)
Commands run serially. With haltOnError: true (the default), the first non-zero exit stops the run and marks the CR Failed.
Targeting¶
spec.targetRef.kind may be:
NextcloudInstance— direct reference to the physical instance in the same namespace.Nextcloud— the operator reads the Nextcloud's.status.instanceRefto find the assigned NextcloudInstance (which may live in a different namespace, e.g. for pool-assigned instances). If the Nextcloud isn't assigned yet, the run is deferred with aTemporaryErrorand retried.
Result reporting¶
.status.results[] is appended in order of spec.commands. Each entry contains:
| Field | Meaning |
|---|---|
command |
The argv that was run. |
exitCode |
occ exit code. -1 = command did not run because timeoutSeconds was exceeded. |
stdoutSnippet / stderrSnippet |
First 4 KiB + last 4 KiB, joined with \n...<truncated>...\n when longer. Enough to diagnose failures without blowing etcd object-size limits. |
startedAt / finishedAt |
Per-command timing. |
timedOut |
true when the overall cap fired before this command could run. |
Use kubectl get nccmd <name> -o yaml for full details or the printer columns (Phase, Target, Kind, Message) for a summary.
RBAC¶
NextcloudCommand is powerful — it lets the holder run arbitrary occ subcommands, which can modify config, users, and app state. Treat creation permissions as equivalent to admin access on the target instance. Grant create/get/list/watch on nextcloudcommands only to subjects that should have operator-level control, typically cluster-admins or a dedicated automation ServiceAccount.
The operator itself already has pods/exec and NextcloudCommand CRUD permissions via the cluster role shipped with the chart.
Secret handling¶
The operator sanitizes argument values that follow flags matching password|secret|key|token|credential (case-insensitive) in its debug logs — both --password=value and --password value shapes are redacted. That covers the common occ flag conventions. Note: the spec.commands themselves are readable by anyone with get on the CR, so don't paste production secrets into the spec if you can avoid it — use config:system:set --value=@/path/to/file patterns or pre-populated Nextcloud secrets where possible.
Retry policy¶
Transient failures from the underlying pod-exec primitive (TemporaryError — pod not found, transport error, handshake failure) are retried automatically with exponential backoff:
- Max attempts: 10
- Backoff: 15 s → 30 s → 60 s → 120 s → 240 s, capped at 300 s thereafter
- Worst-case total wait: about 25 minutes of backoff before the CR is marked
Failed
While retrying, .status.conditions[] carries a Progressing/Retrying condition whose message includes the attempt number and the underlying error, and an event is posted on the CR for every retry. When retries exhaust, the CR transitions to phase: Failed with reason RetriesExhausted — it never hangs at Running.
Permanent failures (validation errors, invalid targetRef) fail immediately with no retry.
Lifecycle hooks¶
Beyond the on-demand pattern shown above, NextcloudCommand can be gated on the target instance being Ready and combined with a small bash beforeScript for setup. Operator-managed lifecycle hooks declared on NextcloudInstance.spec.hooks (or upstream Nextcloud / Profile / Pool) materialize these NCCs automatically at well-defined moments.
Triggers¶
| Trigger | Fires when | Cardinality | Settable on |
|---|---|---|---|
onFirstReady |
Instance reaches Ready for the first time in its lifetime | Once per instance | Profile, Pool template, Nextcloud, NCI |
onAssignmentReady |
Pool instance reaches Ready after being assigned to a Nextcloud | Once per assignment event | Profile, Nextcloud, NCI |
onEveryReady |
Instance reaches Ready, every time (incl. after upgrades, recovery) | Re-fires (fresh NCC per edge) | Profile, Pool template, Nextcloud, NCI |
A single wall-clock Ready edge can fire multiple triggers. For example, a fresh NCI's first Ready fires onFirstReady and onEveryReady. A pool instance's first Ready after assignment fires onAssignmentReady and onEveryReady.
Declaring hooks¶
apiVersion: k8s.bnerd.com/v1alpha1
kind: NextcloudInstance
metadata:
name: my-tenant
spec:
hooks:
onFirstReady:
- name: baseline-apps
commands:
- ["app:install", "richdocuments"]
- ["app:enable", "richdocuments"]
onAssignmentReady:
- name: brand
beforeScript: |
set -e
curl -fsSL https://assets.example.com/logo.png \
-o /var/www/html/themes/custom/logo.png
commands:
- ["theming:config", "logo", "/themes/custom/logo.png"]
onEveryReady:
- name: warm-cache
commands:
- ["files:scan", "--all"]
Cascade¶
Hooks set on NextcloudProfile.spec.defaults, NextcloudPool.spec.template.spec, Nextcloud.spec, or NextcloudInstance.spec cascade through the standard order: Profile → Pool → Nextcloud → NCI. The last source wins (replace, not concatenate).
The pool template intentionally does not accept onAssignmentReady — pool instances don't yet know their tenant. Set it on the Nextcloud or NCI instead.
Generated NCC names¶
Operator-materialized NCCs are deterministically named:
{instance}-hk-fr-{name}-{8char-hash}—onFirstReady{instance}-hk-ar-{name}-{8char-hash}—onAssignmentReady{instance}-hk-er-{name}-e{edge}-{8char-hash}—onEveryReady(edge counter ensures a fresh NCC per Ready edge)
Editing an entry → new hash → a new NCC fires. Old NCCs remain as audit records and are TTL-cleaned after 7 days (override per-NCC with ttlSecondsAfterFinished).
Operator-wide TTL cap
Set the operator env var NCC_GC_MAX_TTL_SECONDS to cap the effective GC TTL of every finished NextcloudCommand, overriding a larger per-NCC ttlSecondsAfterFinished downward. Unset or 0 means no cap. This is useful to drain a large backlog of finished commands (e.g. from a high-frequency onEveryReady hook) without patching or mass-deleting them — set it low, let the operator's own garbage collector clean up, then raise or remove it. A per-NCC TTL of 0 (keep forever) is never overridden.
Self-heal for stranded Pending OnReady commands
A Pending OnReady NCC whose on_create retry chain was abandoned (e.g.
after a long string of operator restarts) would otherwise stay stuck
forever — kopf treats its create handler as already-handled and won't
re-gate. The operator runs a periodic timer that re-invokes the gate for
such stranded commands; once the target is Ready, the hook runs and
transitions to Succeeded. Tunable via env vars:
TIMER_COMMAND_REGATE_INTERVAL— timer interval in seconds (default300).NCC_REGATE_STALE_SECONDS— a Pending OnReady command must have had no status transition within this window before the timer treats it as stranded. Default600. Lower for faster recovery, higher to avoid racing on_create's own active retries.
Standalone OnReady NCCs¶
You can also create NCCs directly with spec.lifecycle: OnReady to gate a one-off command on Ready, without using spec.hooks. The same gate logic applies (wait for phase=Ready AND a Running+Ready pod). The retry budget for OnReady is generous (~200 minutes) so first reconciles that include slow database provisioning can still converge before the CR self-fails.
Failure surface¶
If a hook NCC ends in phase: Failed, the NCI gains a LifecycleHookFailed=True condition with the failing NCC name in the message. The NCI's own phase stays Ready — the instance is serving; only the hook failed. Inspect the failing NCC's .status for the exit codes and snippets.
beforeScript notes¶
- Runs in the
nextcloudcontainer via/bin/bash -c <script>(same container asocc). - No environment-variable injection in v1; reference values via filesystem mounts (e.g. existing chart-mounted secrets) or hard-code in the script.
- A non-zero exit code marks the NCC
Failedwith conditionBeforeScriptFailedand skips thecommandsarray. The exit code, stdout, and stderr land in.status.beforeScriptResult.
Limitations (v1)¶
- Spec is immutable. To re-run, apply a new CR. This keeps
.statusa faithful audit trail of a single run. - No live streaming. Output lands in
.statusafter each command finishes. For very long commands, consider splitting them or polling.statuswithkubectl get -w. - No per-instance serialization. Running two
NextcloudCommandCRs concurrently against the same instance is the caller's responsibility. Nextcloud is not always safe for paralleloccexecution (e.g. duringupgrade) — serialize in the caller. - No
--no-interactioninjected. You supply the literal argv. Add-n/--no-interactionyourself when running commands that might prompt; otherwise they will block and hit the per-command timeout. - No webhook callbacks / synchronous HTTP API — deferred to a later iteration. Poll
.status.phaseto detect completion. - Non-idempotent commands. If the operator retries the handler (e.g. the pod was transiently unavailable), commands run again from the start. Commands like
app:install richdocumentsare idempotent; commands likeuser:addare not. Prefer idempotentoccsubcommands when possible.
See also¶
- CRD Overview — NextcloudCommand Endpoints
- Operations & Annotations — the canned
reconcile/run-maintenanceannotations, which complement this CRD. examples/command-occ-status.yaml,examples/command-enable-richdocuments.yamlin the repo.