Spring Cleaning 2026 Day 6: kvestigate (investigate k*)

Kvestigate - Investigate Kubernetes Clusters

Ever find yourself given access to a piece of garbage kubernetes cluster and you need need to audit it to figure out what the hell is going on?

Say hello to kvestigate your kubernetes investigation and auditing toolkit with the ability to even run ad-hoc workloads beyond the reach of the kubernetes scheduler allocator substrate system by just doing shit on the hosts themselves like computers are supposed to do.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Language              Files        Lines         Code     Comments       Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Python                  141        34192        28551         1381         4260
 TOML                      3           72           57            8            7
 YAML                     23         1059          718          257           84
─────────────────────────────────────────────────────────────────────────────────
 Markdown                 18         3983            0         3083          900
 |- BASH                  13          500          288          141           71
 |- Python                 6          177          132           27           18
 |- YAML                   3          218          187           11           20
 (Total)                             4878          607         3262         1009
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Total                   185        40201        29933         4908         5360
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Features

kvestigate is a read-first Kubernetes toolkit for investigating clusters, running workloads directly on host machines (no Docker), and safely handing a constrained slice of your cluster resources to AI agents.

Three surfaces on one foundation (the official kubernetes client + a privileged node-access DaemonSet):

Investigate & plan — audit a cluster, stream logs, exec into pods, and bin-pack a workload onto available GPU/CPU. Read-only by default.
Run things host-native — create isolated workspaces on a node’s host OS (tmpfs / disk / hybrid storage, GPUs bound by UUID), sync code in, and launch processes. No container build, no pod scheduling.
Host constrained cyberspaces — expose an allow-listed, quota’d, audited slice of those workspace capabilities over HTTP/MCP, so agents and other programs can “upload and run things” without cluster-wide power.

Bonus Feature

Natively integrates with nvidia-smi to query all nodes for cluster-wide gpu usage and power metrics to see how much of the world you are wasting with your “ai” “workloads”

    node           running     SM%         VRAM Usage              Power Usage
  node-01              2/8     25.0 174.4/764.7 GiB  (22.8%)    0.49 kW (10.3%)  35°C  active   saturated
  node-02              2/8     12.5 157.2/764.7 GiB  (20.6%)    0.59 kW (12.4%)  59°C  active   saturated
  node-03              8/8     24.8 617.3/764.7 GiB  (80.7%)    1.16 kW (24.2%)  71°C  active   saturated
  node-04              8/8     12.5 701.9/764.7 GiB  (91.8%)    1.22 kW (25.3%)  79°C  active   saturated
  node-05              8/8     12.5 690.9/764.7 GiB  (90.3%)    1.01 kW (21.1%)  43°C  active   saturated
  node-06              8/8     12.5 695.5/764.7 GiB  (90.9%)    1.06 kW (22.0%)  68°C  active   saturated
  node-07              8/8     12.5 691.4/764.7 GiB  (90.4%)    1.00 kW (20.9%)  49°C  active   saturated
  node-08              4/8      0.0 357.0/764.7 GiB  (46.7%)    0.46 kW ( 9.6%)  38°C  ghost    reserved-idle

cluster aggregate  (only over reporting GPUs)
  GPUs:    48 allocated / 64 capacity (stated 75.0%)    reporting=64 GPU(s) on 8 node(s)
  compute: mean SM util  14.0%  (idle headroom: ~86.0%)
  memory:   3.99/ 5.97 TiB resident  ( 66.8% of fleet VRAM, 33.2% free)
  power:    7.00 kW (18.2%) drawn of 38.40 kW TDP  (thermal/power headroom: ~81.8%)
  activity tally (per GPU): idle=15  reserved-idle=40  saturated=9
  power per GPU (W):  min=29  max=596  mean=109  median=85  stddev=113  (N=64)
  temp per GPU (°C):  min=25  max=79  mean=37  median=35  stddev=11  (N=64)
  power split by activity: idle=456 W (6.5%)  reserved-idle=3.44 kW (49.1%)  saturated=3.10 kW (44.3%)
  wasted power:  2.22 kW across 40 reserved-idle GPU(s)  (observed idle baseline = 30 W (mean of 15 truly-idle GPU(s)); total reserved-idle draw = 3.44 kW)
  verdicts:  ghost=1  active=7  no-allocation=0  no-samples=0
  drain-readiness:  idle (safe to drain)=0  reserved-idle (loaded but no traffic; consolidate)=1  reserved-active (mem held + activity; coordinate)=0
    reserved-idle nodes (model loaded, no compute/mem-bw/power signal — strong consolidate/downsize candidates):
      node-08  stated=4  mem=357.0/764.7 GiB  (46.7%)  pods: prd-r6000-nest/model-multi-2-otel-llm-1, prd-r6000-nest/model-multi-3-otel-llm-0, 
prd-r6000-nest/model-multi-3-otel-llm-8, prd-r6000-nest/model-multi-4-otel-llm-1
    → recoverable capacity: 4 stated GPU(s),  357.0/ 764.7 GiB of held VRAM with zero observed work
  ghost nodes (single-shot — re-probe with --samples N to confirm):
    node-08  stated=4  pods=4  mean=0.0%  max=0.0% < threshold 5.0%

  energy forecast  (straight-line projection of current draw — assumes the fleet keeps doing exactly what it's doing now)
    TDP ceiling:    month= 28.05 MWh  quarter= 84.15 MWh  year=336.61 MWh  (every GPU at 100 % TDP, theoretical max)
    observed draw:  month=  5.11 MWh  quarter= 15.34 MWh  year= 61.36 MWh  (18.2% of TDP, 44.3% productive)
    productive:     month=  2.27 MWh  quarter=  6.80 MWh  year= 27.21 MWh  (LIGHT + ACTIVE + SATURATED — real compute)
    idle baseline:  month=  0.33 MWh  quarter=  1.00 MWh  year=  4.01 MWh  (IDLE class — unavoidable driver-on floor)
    reserved-idle:   month=  2.51 MWh  quarter=  7.54 MWh  year= 30.15 MWh  (total draw on loaded-but-no-traffic GPUs)
    → recoverable:  month=  1.62 MWh  quarter=  4.87 MWh  year= 19.46 MWh  (reserved-idle draw minus observed baseline — saved if you consolidate; 31.7% of total observed)

Use cases

Answer “what’s going on in this cluster right now?” without mutating anything.
Plan whether a multi-GPU job fits, and where, before launching it.
Give an ML workload a 500 GiB RAM-backed scratch space on a bare host, bypassing container/scheduler overhead.
Let an autonomous agent provision + run + fetch results from disposable workspaces, fenced in by policy and quotas it can’t exceed.

Surfaces at a glance

Command	What it does
`kvestigate audit`	Run every report (full cluster picture)
`kvestigate report list`	Show available individual reports
`kvestigate report run <name>`	Run one report by name
`kvestigate namespaces`	List namespaces + pod counts
`kvestigate pods [-n ns]`	List pods, optionally namespaced
`kvestigate describe <kind> <name> [-n ns]`	Inspect a pod / service / deployment / node
`kvestigate logs <pod> [-n ns] [-c container] [-f]`	Stream pod logs
`kvestigate exec <pod> [-n ns] -- cmd args`	Run a command inside a pod
`kvestigate plan ...`	Bin-pack a workload onto the cluster (capacity planning)
`kvestigate workspace status / gpus`	Discover workspaces / per-GPU availability
`kvestigate workspace create / destroy / sync`	Workspace lifecycle (writes gated)
`kvestigate workspace run / ps / logs / kill`	Host-process runs inside a workspace
`kvestigate workspace diff / apply / export`	IaC verbs over `workspace.yaml`
`kvestigate kubeconfig analyze / verify`	Diagnose a kubeconfig (read-only)
`kvestigate kubeconfig convert / diff`	Apply strategies; semantic diff
`kvestigate cybernet init / serve / mcp-server`	Generate cyberspace config; run HTTP / MCP frontend
`kvestigate cybernet emit-helper / status / call`	Generate agent helper script; health check; one-shot call
`kvestigate cybernet hash-token`	sha256-hash a bearer token for the config
`kvestigate fanout targets ...`	Resolve which nodes/pods a fan-out would target (no execution)
`kvestigate fanout exec ... -- cmd`	Run a command on every selected node/pod in parallel
`kvestigate fanout fetch ... PATH DIR`	Pull a file from every target into a local directory
`kvestigate fanout push LOCAL REMOTE`	(write) push a local file to every target
`kvestigate debug context`	Show resolved kubeconfig + how auth is being sent
`kvestigate debug api [path]`	GET a path through the python client (exercises auth)
`kvestigate debug exec-url <pod>`	Show HTTP/WS URLs for exec + attempt handshake
`kvestigate debug timing`	Time representative API calls — diagnose slowness
`kvestigate deploy <manifest>`	(write) apply a YAML manifest
`kvestigate delete <kind> <name>`	(write) delete a resource
`kvestigate node cordon / uncordon <node>`	(write) mark a node un/schedulable (reversible)
`kvestigate node drain <node>`	(write) cordon + evict a node’s pods (clear for new work)
`kvestigate node clear <node>`	(write) evict a node’s pods, leave it schedulable
`kvestigate evict --node/-n/--selector`	(write) evict a scoped set of pods (reclaim capacity)
`kvestigate scale <deploy> <replicas> [-n ns]`	(write) scale a Deployment (normal, or quiesce with 0)
`kvestigate optimize analyze / plan`	Diagnose GPU fragmentation; propose a consolidation plan
`kvestigate optimize probe-gpu`	Fan-out `nvidia-smi`; flag ghost allocations (stated vs actual)
`kvestigate optimize apply`	(write) execute a consolidation plan via cordon + evict
`kvestigate app render / plan`	Compile an App spec to workspace or k8s artifacts (dry-run)
`kvestigate app diff`	Show app-level spec_hash drift + planned backend diff (read)
`kvestigate app up / down`	(write) apply / tear down an App; auto-picks host vs pod
`kvestigate app status / list`	Read recorded apply state (spec_hash, phase) — local cache only

Audit and reports

Full audit (default for “what’s going on”)

kvestigate audit
kvestigate audit --json snapshots/2026-05-20.json     # also save a snapshot

Runs all reports in a sensible order:

nodes — capacity & requested per node, plus cluster totals. Shows CPU, memory, GPU, pod count. Uses allocations (what kube scheduler sees), not live usage.
metrics — live CPU/memory from metrics.k8s.io (metrics-server). Includes top 15 pods by CPU and by memory. Skipped with a notice if metrics-server is unavailable.
gpu — per-node GPU model, count, memory, driver, CUDA, MIG state, allocation; per-container consumer table.
workloads — every Deployment / StatefulSet / DaemonSet with ready vs desired and image.
network — Services (with endpoint counts so you can spot broken selectors), Ingresses, NetworkPolicies.
storage — PersistentVolumeClaims, PersistentVolumes, per-StorageClass bound capacity.

Single report

kvestigate report list
kvestigate report run nodes
kvestigate report run gpu --json snapshots/gpu-now.json

All reports share a single Snapshot so the same API calls aren’t repeated. JSON output is structured + carries a UTC captured_at, so two snapshots taken hours apart can be diffed by anything that reads JSON.

Inspection

Namespaces & pods

kvestigate namespaces
kvestigate pods                          # all namespaces
kvestigate pods -n your-namespace        # one namespace

Describe a resource

kvestigate describe pod my-app-abc123-xyz -n your-namespace
kvestigate describe service my-app -n your-namespace
kvestigate describe deployment my-app -n your-namespace
kvestigate describe node node-04

describe node <name> is the source of truth for “what hardware is this really” — full label dump (NVIDIA labels in their own table, everything else separately), capacity, allocatable, taints, addresses, kernel/runtime/kubelet versions. Use this whenever the summary tables in report run gpu truncate something you care about.

Logs

kvestigate logs my-app-abc123-xyz -n your-namespace
kvestigate logs my-app-abc123-xyz -n your-namespace -f --tail 500
kvestigate logs some-pod -n some-ns -c sidecar           # specific container

Exec

# Read-only inspection inside a pod (this is fine on prod; do not write):
kvestigate exec my-app-abc123-xyz -n your-namespace -- ls /
kvestigate exec model-server-... -n your-namespace -- nvidia-smi

Default transport is the local kubectl binary (set up to use the project kubeconfig). The in-process websocket exec via the Python client is also implemented but currently has a URL-parsing bug through the API proxy — use --transport ws only when you’re debugging that path. See kvestigate debug exec-url <pod> for the diagnostic surface.

Capacity planning — `kvestigate plan`

Bin-packs a workload onto the cluster’s currently-free capacity (allocatable minus current pod requests). Three strategies always run side by side so the operator can see trade-offs:

bestfit — tightest pack, minimizes fragmentation.
spread — maximum HA distribution.
pack — fewest nodes used; consolidates load.

Semantics

--gpu / --cpu / --memory are per replica. Multiply by --replicas to get total demand. This matches PodSpec semantics — so “12 GPUs total, spreadable across nodes” is --gpu 1 --replicas 12, while “single pod with 12 GPUs” is --gpu 12 --replicas 1 (which today’s cluster cannot host — max 8 per node).

Memory accepts 64Gi, 1Ti, etc. CPU accepts 8, 8000m, 0.5, etc.

Examples

# 12 single-GPU replicas, spreadable
kvestigate plan --gpu 1 --replicas 12 --cpu 4 --memory 16Gi

# Tensor-parallel-ish: one pod that needs 12 GPUs (will fail clearly today)
kvestigate plan --gpu 12 --replicas 1

# 4 replicas × 2 GPUs each, HA spread (one replica per node)
kvestigate plan --gpu 2 --replicas 4 --cpu 16 --memory 64Gi --one-per-node

# All replicas must land on the same node (rare, but supported)
kvestigate plan --gpu 4 --replicas 1 --cpu 32 --memory 128Gi --single-node

# Restrict to Blackwell-family GPU nodes via node label
kvestigate plan --gpu 1 --replicas 4 \
    --label nvidia.com/gpu.family=blackwell

# Skip specific nodes
kvestigate plan --gpu 1 --replicas 6 --exclude node-04 --exclude node-08

# Only run one strategy
kvestigate plan --gpu 1 --replicas 8 --strategy spread

# Save the plan output (free capacity + placements) as JSON
kvestigate plan --gpu 1 --replicas 12 --json snapshots/plan-12gpu.json

Fan-out — `kvestigate fanout`

Run commands or transfer files across many nodes or pods in parallel.

Target model

A “target” is something to talk to. Three target kinds:

`--target`	Selection flags	Pod used for exec	Command wrapper
`nodes`	`--node N` (repeatable) / `--exclude-node N`	the `default/direct-node-access` DS pod on that node	`chroot /host …` for commands; raw access via `/host/…` for files
`pods`	`--namespace`, `--selector`, `--pod`, `--container`	the pod itself	none
`deployment`	`--deployment`, `--namespace`, `--container`	each pod of the Deployment	none

The node-access path uses the existing privileged DaemonSet (hostPID/hostIPC/hostNetwork: true, host / mounted at /host, tolerates all taints), which means no new infrastructure is created — no kubectl debug node/… pod spawns, no privileged container deploys.

Different DaemonSet on your cluster? Override with --node-ds-namespace / --node-ds-name.

`fanout targets` — preview without executing

kvestigate fanout targets --target nodes
kvestigate fanout targets --target pods -n your-namespace -l app=my-app
kvestigate fanout targets --target deployment --deployment my-app -n your-namespace

`fanout exec` — run a command everywhere

# nvidia-smi on every node
kvestigate fanout exec --target nodes -- nvidia-smi

# Kernel version per node (with a tighter timeout)
kvestigate fanout exec --target nodes --timeout 20 -- uname -r

# Subset of nodes
kvestigate fanout exec --target nodes --node node-04 --node node-05 -- df -h /

# Inside specific pods
kvestigate fanout exec --target pods -n your-namespace -l app=my-app -- ps -ef

# All pods of a deployment
kvestigate fanout exec --target deployment --deployment my-app -n your-namespace -- env

# Dry-run: print resolved per-target argv, do not execute
kvestigate fanout exec --target nodes --dry-run -- nvidia-smi

Per-target progress streams immediately (✓ node-08 exit=0 1.3s) so slow / stuck targets are visible in real time. Add --full to disable output truncation; --json PATH for structured results.

Worked example: `nvidia-smi` — node view vs. pod view

nvidia-smi is the canonical case where the same command tells you different things depending on where you run it, so pick the target on purpose:

Question	Run against
What’s actually running on every physical GPU in the cluster?	`--target nodes`
Are any GPUs idle / hot / throttling?	`--target nodes`
Which OS processes (from any namespace) are bound to which GPU?	`--target nodes`
What’s the NVLink / PCIe topology on a host?	`--target nodes`
Is my service using its allocated GPU correctly?	`--target pods -l app=…` or `--target deployment …`
Is my container throttling?	`--target pods …` (hardware-side throttling: `--target nodes`)

Why the difference: a pod only sees the GPUs the NVIDIA device plugin gave it, renumbered to local indices starting at 0. So a pod with one allocated GPU always reports a single “GPU 0” — it has no way to know it’s physically GPU 5 on node-07. The host’s nvidia-smi sees every GPU and every process touching it regardless of pod ownership. Pick the view that matches the question.

Cluster-wide GPU snapshot, compact (recommended default)

kvestigate fanout exec --target nodes --full -- \
  nvidia-smi \
  --query-gpu=index,memory.used,memory.total,utilization.gpu,temperature.gpu,power.draw \
  --format=csv,noheader,nounits

`fanout fetch` — pull a file from every target

# Get /etc/hosts from every node into ./out/<node>/hosts
kvestigate fanout fetch /etc/hosts ./out/ --target nodes

# Get a directory tree
kvestigate fanout fetch /var/log/cloud-init.log ./logs/ --target nodes

# From a specific pod
kvestigate fanout fetch /app/config.yaml ./pod-configs/ \
    --target pods -n your-namespace --pod my-app-…-xyz

For node targets, paths are absolute on the host (the tool translates them to the /host/… mount internally — you write /etc/hosts, not /host/etc/hosts).

`fanout push` — write a file to every target (WRITE — gated)

# Dry run is always safe to use on prod
kvestigate fanout push ./hosts.allow /etc/hosts.allow --target nodes --dry-run

# Real push requires an explicit confirmation flag
kvestigate fanout push ./hosts.allow /etc/hosts.allow --target nodes \
    --i-understand-this-writes-to-hosts

Without --i-understand-this-writes-to-hosts the command refuses, even after resolving targets. This is the same safety posture as deploy / delete: writes only with explicit per-action authorization.

Common flags

Flag	Meaning
`--max-parallel N`	Concurrent subprocesses (default 8)
`--timeout SECS`	Per-target wall-clock timeout (default 60)
`--dry-run`	Resolve targets and print plan, do not execute
`--full`	(exec only) print full output, no truncation
`--json PATH`	Write structured results
`--node-ds-namespace` / `--node-ds-name`	Override the node-access DaemonSet location

Workspaces — host-native workloads without Docker

kvestigate workspace lets you run workloads directly on a node’s host OS (via the privileged direct-node-access DaemonSet) instead of through a K8s Deployment. Useful for ML research, fine-tuning, dataset prep — anywhere container indirection isn’t earning its keep and the host’s installed NVIDIA driver + rsync + Python is all the platform you need.

A workspace is a directory on a chosen node with a known subdir layout (code/, env/, datasets/, checkpoints/, logs/, pids/), optionally backed by tmpfs for speed. Cluster prerequisites (already met here): driver 580+, rsync on the host, host root mounted at /host inside the DS pod.

Storage modes

Pick one per workspace. The fast-cleanup property of all-tmpfs is deliberate — when the workload’s output is checkpointed elsewhere, destroy is one umount + one rmdir, no file walk at all.

Mode	What’s tmpfs-backed	Persistence across reboot	Cleanup
`disk`	nothing	everything	`rm -rf <workspace>` (file walk; slow for big trees)
`hybrid` (default)	only `datasets/`	code, env, checkpoints, logs	`umount <ws>/datasets` then `rm -rf <ws>`
`all-tmpfs`	the entire workspace root	none	`umount <ws>` + `rmdir <ws>` — instant

tmpfs_size defaults to 500G; on these 1.5 TiB nodes that leaves ~900 GiB+ free RAM after the mount.

Surface

kvestigate workspace status               # discover workspaces on all nodes
kvestigate workspace gpus  --node N       # real GPU availability (nvidia-smi × k8s)
kvestigate workspace create  --config foo.yaml [--apply]
kvestigate workspace destroy --config foo.yaml [--apply]
kvestigate workspace sync   <local>  --config foo.yaml [--apply] [--delete] [--exclude PAT]
kvestigate workspace run    --config foo.yaml --tag T --gpu N [--apply] -- <cmd...>
kvestigate workspace ps     --config foo.yaml
kvestigate workspace logs   TAG --config foo.yaml [-f] [--tail N]
kvestigate workspace kill   TAG --config foo.yaml [--signal TERM] --i-understand-this-modifies-hosts
kvestigate workspace diff   --config foo.yaml
kvestigate workspace apply  --config foo.yaml [--apply]
kvestigate workspace export NAME --node N -o foo.yaml

Every write subcommand is default --dry-run and additionally requires --i-understand-this-modifies-hosts to actually execute (same posture as fanout push).

Infrastructure-as-code spec

A workspace is fully described by a YAML file. jsonargparse does the loading; the schema is the WorkspaceSpec dataclass tree.

# workspace.yaml — the durable, versionable form
workspace:
  name: finetune-experiment
  node: node-02
  api_version: v1
  root: /opt/kvestigate-workspaces

  storage:
    mode: all-tmpfs # disk | hybrid | all-tmpfs
    tmpfs_size: 500G
    # hybrid_tmpfs_path: datasets   # only used in hybrid mode

  gpu:
    reserve: 4 # planner picks 4 currently-free GPUs
    # indices: [0, 1, 2, 3]  # OR pin specific physical indices
    cordon_node: true # `kubectl cordon` while workspace is active

  sources:
    - local: ./my-code
      subdir: code
      excludes: ["__pycache__", "*.pyc", ".git"]
    - local: ./small-fixtures
      subdir: datasets

IaC workflow

# Read-only: what would apply do?
kvestigate workspace diff --config workspace.yaml

# Dry-run the whole reconciliation (create + every source rsync)
kvestigate workspace apply --config workspace.yaml

# Execute (requires the safety flag)
kvestigate workspace apply --config workspace.yaml --apply \
    --i-understand-this-modifies-hosts

# Save a snapshot of an existing workspace back to YAML
kvestigate workspace export finetune-experiment --node node-02 \
    -o snapshots/finetune-2026-05-20.yaml

Launching a run (no Docker, GPUs bound by UUID)

# Dry-run: shows the exact shell script that would execute on the host
kvestigate workspace run \
    --config workspace.yaml \
    --tag train-001 --gpu 0 --gpu 1 --gpu 2 --gpu 3 \
    --env WANDB_DISABLED=true \
    -- python train.py --batch-size 32 --epochs 10

# Real launch
kvestigate workspace run --config workspace.yaml --tag train-001 \
    --gpu 0 --gpu 1 --gpu 2 --gpu 3 \
    --apply --i-understand-this-modifies-hosts \
    -- python train.py --batch-size 32

# Watch progress
kvestigate workspace ps   --config workspace.yaml
kvestigate workspace logs train-001 --config workspace.yaml -f

# Stop it
kvestigate workspace kill train-001 --config workspace.yaml \
    --signal TERM --i-understand-this-modifies-hosts

GPU indices are resolved to UUIDs at launch time and bound via CUDA_VISIBLE_DEVICES=GPU-uuid-1,GPU-uuid-2,… — survives any device-plugin renumbering. Each run writes logs/<tag>.log, pids/<tag>.pid, and pids/<tag>.meta (cmd, gpus, started_at).

Cleanup story (the all-tmpfs fast path)

# all-tmpfs cleanup is a single umount + rmdir, no file walk:
kvestigate workspace destroy --config workspace.yaml          # shows plan
kvestigate workspace destroy --config workspace.yaml --apply \
    --i-understand-this-modifies-hosts
#   [1] umount /opt/kvestigate-workspaces/finetune-experiment   ← entire workspace gone
#   [2] rmdir  /opt/kvestigate-workspaces/finetune-experiment

For hybrid mode, destroy umounts the datasets tmpfs and then does rm -rf for the small persistent layout. For disk mode it’s a full file walk; use all-tmpfs if cleanup speed matters.

GPU contention with K8s

K8s and the device plugin do not know about host processes you launch. The default cordon_node: true calls kubectl cordon on the workspace’s node when apply runs and uncordons on destroy, so K8s won’t schedule new GPU pods onto a node you’re using directly. Set cordon_node: false if you’re sure no contention will happen (e.g. on a dedicated research node) or want to manage that separately.

What `nvcc not in PATH` means

These hosts have the driver + runtime but no CUDA dev toolkit. Pre-built PyTorch / JAX / TensorRT wheels work fine (they ship their own CUDA libs). If your code calls nvcc directly (custom kernel JIT compile), install the toolkit per workspace via uv into env/ and adjust PATH in --env.

Cybernet — constrained server interface for other agents

kvestigate cybernet exposes a subset of kvestigate over HTTP REST and/or MCP, gated by a per-deployment YAML policy. The intended use case is “let other code agents launch host-native workloads on a curated host set, without giving them the keys to the cluster.”

Architecture

agent ──▶ HTTP/REST  ─┐
agent ──▶ MCP stdio  ─┼─▶ policy engine ──▶ kvestigate library
agent ──▶ HTTP/REST  ─┘        │
                               ▼
                         audit log (JSONL)

Two frontends share one policy engine. Adding a third (gRPC, websockets) costs nothing — the engine and registry don’t change.

Managed workflow — three commands instead of fifteen

The full lifecycle (token generation → config → server → agent helper → distributed job → cleanup) is wrapped by three high-level surfaces so operators don’t hand-write YAML and agents don’t hand-write orchestration.

Step 1 — operator: `cybernet init`

kvestigate cybernet init \
    --name research-pool \
    --template distributed-cpu \
    --listen 0.0.0.0:8765 \
    --output research-pool.yaml

Generates a random bearer token, writes a cyberspace YAML from the named template (distributed-cpu, research-agents, observability, or blank), prints the token once and the next-step commands. The token’s sha256 hash is what lives in the YAML; the plaintext is shown in the terminal output for the operator to hand to the agent out-of-band.

Step 2 — operator: `cybernet serve`

kvestigate cybernet serve --config research-pool.yaml

Same as before — starts the HTTP REST frontend.

Step 3 — operator: `cybernet emit-helper`

kvestigate cybernet emit-helper http://localhost:8765 \
    --output agent_helper.py

Reads /info from the running server and writes a Python module tailored to that cyberspace: one function per allowed operation, plus a new_job() factory if the cyberspace permits the operations needed for the DistributedJob pattern. The generated file is self-contained — hand it to the agent along with the bearer token.

Step 4 — agent: import and run

import os
os.environ["KVESTIGATE_CYBERNET_TOKEN"] = "<token operator gave you>"

from agent_helper import new_job

with new_job(prefix="job-finetune", shards=5) as job:
    job.discover_and_plan(cpu_per_shard=100, mem_per_shard="600Gi")
    job.provision(tmpfs_per_shard="600Gi")
    job.sync_code("./my-workload", excludes=["__pycache__", ".git"])
    job.run(["sh", "-c", "python worker.py"])
    job.wait_until_complete(poll_interval=10)
# Workspaces auto-destroyed on exit — instant when storage_mode is all-tmpfs.

That’s the whole agent program. The DistributedJob session:

discovers candidate nodes via report.run name=nodes
picks the top-N by free CPU among those meeting per-shard requirements
creates one workspace per chosen node, rolling back on partial failure
rsync-pushes code into each workspace’s code/
launches the command in each shard with DIST_RANK / DIST_WORLD_SIZE / DIST_PEERS / DIST_COORDINATOR env vars injected automatically
polls workspace.ps to detect completion and tails rank-0 logs for progress
on context exit (success or exception) destroys every shard

Three job patterns: symmetric, parallel, heterogeneous

DistributedJob distinguishes three workload shapes. Pick the matching method — the wrong one either runs the same code N times by accident, or hand-builds boilerplate the session class would emit for free.

1. Symmetric — same command on every shard (`job.run`)

Right for distributed training and any workload where every rank runs the same program and branches on DIST_RANK / DIST_WORLD_SIZE / DIST_PEERS / DIST_COORDINATOR (all injected automatically).

job.run(["sh", "-c", "python train.py"])

2. Embarrassingly parallel — same binary, different args (`job.run_parallel`)

Right for chunk-N-of-M-style workloads: every shard runs the same program but with a different argument list, no inter-shard coordination beyond initial dispatch.

job.run_parallel(
    cmd_base=["python", "process.py"],
    args_per_shard=[
        ["--input", "chunks/0.parquet"],
        ["--input", "chunks/1.parquet"],
        ["--input", "chunks/2.parquet"],
        ["--input", "chunks/3.parquet"],
        ["--input", "chunks/4.parquet"],
    ],
)

3. Heterogeneous — different command per shard (`job.run_tasks`)

Right for coordinator/worker setups, multi-stage pipelines, parameter servers, or any “rank 0 is special” workload. Each shard gets its own ShardTask (cmd, env, tag).

from kvestigate_client import ShardTask

job.run_tasks([
    ShardTask(
        cmd=["python", "coordinator.py", "--world-size", "5"],
        env={"ROLE": "coordinator"},
        rank=0,
    ),
    *[
        ShardTask(
            cmd=["python", "worker.py", "--worker-id", str(i)],
            env={"ROLE": "worker"},
            rank=i,
        )
        for i in range(1, 5)
    ],
])

The rank field is optional — when omitted, list position dictates which shard gets the task. Mismatched lengths and duplicate ranks raise ValueError rather than silently misbehaving.

Placement strategy — which node gets each shard?

discover_and_plan(..., strategy=...) controls how the planner orders the eligible nodes after the host filter passes:

Strategy	What it does	Use when
`"least-loaded"` (default)	Uses live metrics-server data for CPU + RAM; sorts ascending by combined live load. Falls back to alloc-side (`cpu_requested`/`mem_requested`) if metrics-server is unavailable.	You’re “double-using” a cluster where pods reserve more than they consume — pick the quietest nodes so you don’t disturb co-tenants.
`"most-free-capacity"`	Sort by free CPU (`cpu_allocatable - cpu_requested`) descending.	You want the biggest absolute slack regardless of relative load. Conservative; respects the K8s scheduler’s view.

job.discover_and_plan(
    cpu_per_shard=100,
    mem_per_shard="600Gi",
    gpu_per_shard=6,
    strategy="least-loaded",      # the default
)

The host filter (min_free_cpu_cores / min_free_memory / min_free_gpu) runs before sorting — so nodes that can’t fit a single shard are dropped regardless of strategy. The strategy only decides the order within the viable set.

Per-shard data — `job.sync_per_shard`

Pairs with the heterogeneous and embarrassingly-parallel patterns: each shard needs its own input chunk. sync_code pushes the same directory to every shard; sync_per_shard pushes different directories per rank.

job.sync_code("./pipeline-code")                  # shared code → every shard

job.sync_per_shard({                              # per-shard data → only listed ranks
    1: "./inputs/chunk-0.jsonl",
    2: "./inputs/chunk-1.jsonl",
    3: "./inputs/chunk-2.jsonl",
    4: "./inputs/chunk-3.jsonl",
}, subdir="datasets/input")

Ranks not in the map are skipped — useful when rank 0 is a coordinator that doesn’t process data chunks itself.

See examples/agents/heterogeneous_pipeline.py for the full coordinator-plus-workers walkthrough.

Complete workflow kits

For end-to-end recipes — operator config + agent script + worker code + test-data helpers, all in one directory — see examples/workflows/. Each subdirectory is a runnable kit you can copy and modify:

Kit	Pattern
`parallel-batch-processing/`	Embarrassingly parallel — N workers each process a different input chunk
`coordinator-workers-http/`	Heterogeneous coordinator + workers over HTTP
`readonly-monitoring/`	Long-running polling agent against a read-only cyberspace
`gpu-single-node/`	One workspace on one node with 6+ GPUs bound (single-shard multi-GPU)
`cpu-bulk-dataset-gen/`	Distributed synthetic dataset generation using `least-loaded` placement

Each kit’s own README.md walks through operator setup → agent invocation → expected output → how to extend → what’s deliberately omitted.

Lower-level surfaces (still available)

For ad-hoc invocations or shell scripting, the older commands all work:

# Health check
kvestigate cybernet status http://localhost:8765

# One-shot operation invocation
kvestigate cybernet call workspace.gpus \
    --url http://localhost:8765 \
    --params '{"nodes": ["node-06"]}'

# From Python with no helper module:
#   from kvestigate_client import CybernetClient, DistributedJob
#   c = CybernetClient(base_url="...", token=os.environ["..."])
#   c.call("workspace.create", {...})

# MCP frontend (for LLM-driven clients)
kvestigate cybernet mcp-server --config research-pool.yaml

# 6. For LLM-driven clients (Claude Desktop, etc.):
kvestigate cybernet mcp-server --config my-cyberspace.yaml
# (point the MCP client at this command's stdio)

Cyberspace spec (the YAML shape)

apiVersion: kvestigate.cybernet/v1
kind: Cyberspace
metadata:
  name: research-agents
  description: "Constrained surface for ML research agents."

listen:
  http: { address: "0.0.0.0:8765" }
  mcp: { stdio: true } # both frontends are optional

auth:
  tokens:
    - identity: research-agent-1
      sha256: "<sha256 of bearer token>" # see `kvestigate cybernet hash-token`
      tags: [gpu, research]

operations:
  # Default-deny. Listed ops reach the policy engine.
  allow:
    - report.run
    - workspace.status
    - workspace.gpus
    - workspace.diff
    - workspace.create
    - workspace.sync
    - workspace.run
    - workspace.ps
    - workspace.kill
    - workspace.destroy

constraints:
  hosts:
    allow_names: ["node-06", "node-07", "node-08"]
    require:
      min_free_gpu: 4
      min_free_cpu_cores: 80
      min_free_memory: "800Gi"
  workspace:
    name_pattern: "^agent-[a-z0-9][a-z0-9-]{0,30}$"
    storage:
      allowed_modes: ["all-tmpfs", "hybrid"]
      max_tmpfs_size: "500Gi"
    gpu:
      max_per_workspace: 4
      cordon_node_forbidden: true

quotas:
  max_active_workspaces_per_identity: 3
  max_total_active_workspaces: 10
  max_total_active_gpu: 32
  max_total_active_memory: "5Ti"

audit:
  path: "/var/log/cybernet/research-agents.jsonl"
  include_denials: true

HTTP API surface

Route	Auth	Purpose
`GET /info`	none	Public metadata — operations, host constraints, quota usage
`POST /api/v1/op/<name>`	bearer	Dispatch an allowed operation with JSON body params

Error responses use a consistent envelope: {"error": {"code": "...", "message": "...", "reason": "..."}} with status codes 401 (unauth), 403 (op not in allow-list), 422 (host or parameter constraint), 409 (quota), 500 (execution error).

Operation registry (the universe of namable operations)

Read operations: report.run, workspace.status, workspace.gpus, workspace.diff, workspace.ps.

Write operations: workspace.create, workspace.destroy, workspace.sync, workspace.run, workspace.kill.

Every operation is opt-in via operations.allow; nothing else is reachable through cybernet even though kvestigate itself can do more.

Audit log

Every request — allowed or denied — appends one JSONL record to audit.path. Each record carries timestamp, identity, remote address, operation, parameters (with token/password/secret keys masked), decision, reason, outcome, duration, and post-call quota snapshot.

Growing cybernet

New operation: add an Operation(name, kind, description, handler) to cybernet/registry.py:_REGISTRY. The HTTP and MCP frontends pick it up automatically. List it in your cyberspace’s operations.allow to make it reachable.
New constraint dimension: add a field to the relevant *ConstraintsSpec dataclass in cybernet/spec.py, handle it in cybernet/policy.py:_check_workspace_params. Tests in tests/cybernet/test_policy.py show the pattern.
New frontend: wrap the same policy.authorize → registry.get(...).handler flow used by server.py and mcp_adapter.py. The engine doesn’t care how the request arrived.

Node capacity management — `node` / `evict` / `scale`

The emergency + capacity surface: cordon a node, clear or drain it for new work, evict a scoped set of pods, or scale a Deployment (normal use, or quiesce to zero). The node subgroup is cordon / uncordon / drain / clear; evict and scale are top-level.

The pod-removing commands (drain, clear, evict) are the highest-blast-radius in kvestigate, so they all:

default to dry-run — they print the exact plan (what would be evicted, what is skipped and why) and do nothing until you pass --apply and the matching confirmation flag;
use the eviction API, so PodDisruptionBudgets are honored (a PDB block is reported, not force-killed) and termination is graceful;
protect infrastructure by default — DaemonSet, mirror/static, already-terminating, and completed pods are never evicted, and pods in system namespaces (kube-system, kube-public, kube-node-lease) are skipped unless you pass --include-system.

They are operator-only: not exposed as cybernet operations, and the policy engine hard-denies their names so an agent can never reach them (cybernet.policy.DANGEROUS_OPERATIONS).

`node cordon` / `node uncordon` — toggle schedulability

Reversible and non-evicting — cordon just stops new pods from landing on a node; uncordon reverses it. (These also exist as workspace cordon/uncordon for the GPU-protection workflow; the node forms are the general operator verbs.)

kvestigate node cordon gpu-node-7   --i-understand-this-modifies-cluster
kvestigate node uncordon gpu-node-7 --i-understand-this-modifies-cluster

`node drain` — clear a node for new workloads

Cordons the node (so nothing reschedules back) and evicts its evictable pods.

# Preview what draining gpu-node-7 would evict / skip (safe; no changes)
kvestigate node drain gpu-node-7

# Actually drain it (cordon + evict). Both flags are required.
kvestigate node drain gpu-node-7 --apply --i-understand-this-disrupts-workloads

# Also evict system-namespace pods (use with care), with a grace period
kvestigate node drain gpu-node-7 --include-system --grace-period 30 \
    --apply --i-understand-this-disrupts-workloads

Pairs naturally with the rest of the tool: kvestigate plan to see what fits, node drain to make room, then workspace create on the freed node.

`node clear` — evict a node’s pods but leave it schedulable

Like drain, but it does not cordon — use it to reclaim a node’s capacity while keeping it open for new scheduling. Same protections and flags as drain.

# Preview
kvestigate node clear gpu-node-7

# Evict the node's pods, node stays schedulable
kvestigate node clear gpu-node-7 --apply --i-understand-this-disrupts-workloads

(node clear gpu-node-7 is the node-scoped shorthand for evict --node gpu-node-7.)

`evict` — reclaim capacity by scope (no cordon)

Evict a set of pods chosen by node, namespace, and/or label selector. Unlike drain, it does not cordon — use it to reclaim capacity while leaving the node schedulable.

# Preview: every (non-system, non-infra) pod on a node
kvestigate evict --node gpu-node-7

# All pods in a namespace
kvestigate evict --namespace batch-jobs --apply --i-understand-this-disrupts-workloads

# By label selector, optionally scoped to one namespace
kvestigate evict -l app=stale-trainer -n research \
    --apply --i-understand-this-disrupts-workloads

# Selector across the whole cluster
kvestigate evict --selector 'tier=preemptible' --apply --i-understand-this-disrupts-workloads

At least one of --node / --namespace / --selector is required.

`scale` — resize a Deployment (normal use or quiesce)

# Preview a scale change (shows current → target)
kvestigate scale my-app 5 -n my-ns

# Apply it
kvestigate scale my-app 5 -n my-ns --apply --i-understand-this-modifies-cluster

# Quiesce to zero (stop the workload without deleting the Deployment)
kvestigate scale my-app 0 -n my-ns --apply --i-understand-this-modifies-cluster

--namespace/-n defaults to the current kubeconfig context’s namespace.

Cluster optimization — `optimize`

A diagnose / plan / apply surface for consolidating thin GPU workloads onto denser nodes so the cluster keeps fully-used nodes alongside fully-empty ones for isolated runs and experiments. Strictly operator-only — never exposed as a cybernet operation, hard-denied in policy.DANGEROUS_OPERATIONS.

The pipeline:

audit --json — capture a cluster snapshot.
optimize analyze --audit <path> — quantify fragmentation.
optimize plan --audit <path> (with tunables) — get a concrete move list, donor nodes to drain, target nodes to densify.
(Optional) optimize probe-gpu --audit <path> — fan-out nvidia-smi read-only and flag ghost allocations (stated GPUs without live work).
optimize apply --audit <path> --apply --i-understand-this-disrupts-workloads — execute the plan via the existing nodeops cordon + wave-evict primitives (PodDisruptionBudgets respected, infra/system pods protected).

How the planner works

Best-fit-decreasing bin-pack with disjoint donor / target sets so pods never bounce through a node that’s being emptied:

A donor is any node whose stated GPU utilization is strictly below --donor-max-util (default 0.3). Donors get drained.
A target is any other partial node with free GPUs. Targets get denser.
Pods on donors are sorted largest-first and placed onto the target with the smallest sufficient remaining capacity.
A pod that won’t fit anywhere is reported as unplaceable — never moved.

Tunables

Flag	Default	What it does
`--donor-max-util`	`0.3`	A node below this stated util is a donor candidate. Raise it to be more aggressive about emptying nodes.
`--min-free-nodes`	`1`	Target number of empty nodes to keep available for isolated runs. Reported, not enforced.
`--exclude-namespace`	(none)	Repeatable. Pods in these namespaces are never moved.
`--exclude-node`	(none)	Repeatable. These nodes are never donors and never targets.
`--pin`	(none)	Repeatable `namespace/pod-pattern` rules (`*` glob). Matching pods are never moved.
`--wave-size`	`4`	Pods evicted per wave during `apply`.
`--grace-period`	(none)	Seconds passed to the eviction subresource.
`--ghost-threshold`	`0.05`	(probe-gpu) Mean live GPU util below this flags the node’s allocation as a ghost.
`--samples`	`1`	(probe-gpu) Number of nvidia-smi rounds per GPU. Use 3–5 for confident ghost detection.
`--interval`	`1.0`	(probe-gpu) Seconds between sampling rounds when `--samples > 1`.

Data source: `--audit` or `--live`

Every optimize subcommand accepts either --audit <path> (a captured kvestigate audit --json <path> file — reproducible, shareable) or --live (query the cluster right now via the configured kubeconfig). Both go through the same reports.gpu collector, so a saved audit and a live read produce identical typed views — only the time of capture differs.

# Against a saved snapshot (reproducible):
kvestigate optimize analyze --audit cluster.audit.json

# Against the cluster right now (same output shape):
kvestigate optimize analyze --live

Worked example (the snapshot in this repo)

cluster.audit.json is a real audit: 8 GPU nodes, 5 fully packed (100%), 1 half-full (50%), 2 nearly empty (12%). 18 free GPUs are stranded across 3 partial nodes — no single node has a contiguous block bigger than 7.

# Diagnose:
kvestigate optimize analyze --audit cluster.audit.json   # or --live

# Plan with a tunable:
kvestigate optimize plan --audit cluster.audit.json --min-free-nodes 2
# → 2 moves; drains node-01 and node-02; densifies node-08
#   (50% → 75%); empty nodes 0 → 2.

# Cross-check stated vs actual GPU usage (read-only):
kvestigate optimize probe-gpu --live --ghost-threshold 0.05

# Execute (gated):
kvestigate optimize apply --live --min-free-nodes 2 \
    --apply --i-understand-this-disrupts-workloads

Operator transparency in the output

analyze, plan, and probe-gpu all render with no hidden math:

analyze labels each metric with its definition (stranded = free GPUs on partial nodes, largest single-node free block, etc.) and prints the consolidation-opportunity rule explicitly (total_free > largest_block AND partial > 1).
plan prints every tunable in effect, the donor/target candidacy rule, the moves, the unplaceable list with reasons, and a projected per-node allocation table with the explicit deltas per node.
probe-gpu is fully detailed: it prints the threshold, the verdict rule (mean_util_pct = sum(sample.util)/len(samples); ghost if mean / 100 < threshold), and for every probed node the per-GPU samples (index / util% / memory MiB), the pods accounted on the node, the computed mean, and the verdict. The summary at the bottom lists ghost / active / no-allocation / no-samples counts.

Ghost allocations (windowed nvidia-smi probe)

The k8s scheduler accounts pods at their request level: a pod that asked for 8 GPUs but only really uses 1 is still “8 allocated”. optimize probe-gpu fans out nvidia-smi (read-only) and lets you see, per GPU, both compute utilization and memory used — so you can distinguish:

truly empty (0% util, 0 MiB) — safe consolidation target;
model warm but idle at probe instant (0% util, ~tens of GB memory loaded) — the model is hot, the workload just isn’t currently serving;
actively running (non-zero util).

A single sample can mislead you if a workload bursts between requests. The probe supports a sampling window so the verdict is based on multiple nvidia-smi reads spaced over time:

# Windowed: 5 samples per GPU spaced 2s apart (~10s total wall-clock).
kvestigate optimize probe-gpu --live --samples 5 --interval 2

What the windowed output gives you on each GPU:

per-GPU mean / max / min util%, and mean memory used across the window;
a bursty tag when a GPU went idle and active during the window (max - mean > 20pp) — that’s exactly the workload you’d misjudge with a single sample;
node-level verdict still based on mean across all readings — but you can see node_max alongside it, so a node with a brief 100% burst is visibly not a ghost even if the mean is low.

A “confident ghost” in this scheme is one where every sample on every GPU is below threshold across the window — the summary line explicitly says confident (N=… over ~Xs) so the operator can see the signal strength behind the call.

Apps — `kvestigate app`

A single declarative YAML kind (kvestigate App) that compiles deterministically to either a host-native workspace or a small set of Kubernetes manifests, chosen by an explicit target: field. No templating engine, no in-cluster controller — the spec is a target format the user generates with whatever upstream tool, then kvestigate app up compiles + applies through the existing cybernet audit/quota/policy gate.

Full design in docs/APP_SPEC.md; summary here.

Commands

kvestigate app render <spec.yaml>           # print compiled artifacts (yaml | json)
kvestigate app plan   <spec.yaml>           # dry-run plan (no side effects)
kvestigate app up     <spec.yaml> --apply --i-understand-this-modifies-cluster
kvestigate app down   <spec.yaml> --apply --i-understand-this-modifies-cluster
kvestigate app diff   <spec.yaml>           # spec_hash drift + backend diff
kvestigate app status <spec.yaml>           # last-recorded apply state for this App
kvestigate app list                         # every App with a persisted status

All commands default to dry-run; --apply plus the safety flag is required to mutate (same pattern as workspace create / optimize apply).

A minimal App spec

app:
  metadata: { name: trainer-01, namespace: research }
  target: auto # workspace | k8s | auto
  command: [python, -m, train]
  env: { OMP_NUM_THREADS: "16" }
  lifecycle: interactive # service | batch | interactive
  resources: { gpu: 2, memory: 64Gi }
  workspace: # used when target resolves to workspace
    node: gpu-node-7
    storage: { mode: hybrid, tmpfs_size: 500G }
    sources: [{ local: ./code, subdir: code }]

That spec compiles to a WorkspaceSpec + LaunchPlan (host-native path) because the target: auto policy sees an explicit workspace.node opt-in and at least one supporting signal — any one of {GPU ≥ 1, lifecycle: batch | interactive, tmpfs ≥ 64 GiB} is enough.

`target:` — host or pod, and how `auto` decides

Field value	Behavior
`workspace`	Always compiles to a host-native workspace (via `lifecycle.create` + `workspace run`).
`k8s`	Always compiles to a Deployment (+ Service when `k8s.ports` is set).
`auto`	Runs the policy below. Falls back to `k8s` when nothing decides.

The auto policy (see app/target.py):

Refuse on hard caveats (e.g. storage.mode=all-tmpfs + restart: always is a contradiction — node reboot wipes the workspace).
Workspace requires explicit opt-in: workspace.node must be set. Without that, the workspace block defaults aren’t operator intent and should not promote to workspace.
With workspace.node set, any one of {big tmpfs ≥ 64 GiB, GPU ≥ 1, lifecycle: batch | interactive} flips to workspace.
Otherwise: k8s — the safe default until Roadmap A1 (cgroup-capped runs) lands; pods get cgroups via the kubelet for free.

Using AppSpec from Python

Everything load / compile / status / diff related is importable from kvestigate.app. The shape mirrors the CLI exactly:

from pathlib import Path
from kvestigate.app import (
    # Specs + enums (closed-domain types are all StrEnums)
    AppSpec, Target, Lifecycle, RestartPolicy, WorkloadKind, ExposeMode, AppPhase,
    # Load / dump (YAML <-> dataclass)
    load_app, load_app_friendly, dump_app, AppLoadError, resolve_shared_config_files,
    # Compile (pure-functional; no I/O)
    compile_to_workspace, compile_to_k8s, compile_to_distributed, spec_hash,
    # Target policy
    pick_target, ResolvedTarget,
    # Status persistence + drift
    write_status, read_status, status_path_for, APP_STATUS_DIR, RecordedStatus,
    diff_app, AppDiff, DriftStatus,
    # Presentation
    format_manifest_summary, format_apply_results,
)

# Run from the repo root, or pass an absolute path here:
spec = load_app(Path("examples/apps/web-service.yaml"))
manifests = compile_to_k8s(spec)         # list[dict] ready for kubectl apply
drift = diff_app(spec, status_path_for(spec.metadata.namespace, spec.metadata.name))

Submodule layout is an implementation detail for production callers, with one exception: tests that need to redirect status persistence (monkeypatch.setattr("kvestigate.app.status.APP_STATUS_DIR", tmp_path)) must reference the submodule directly. If you need a name that isn’t in __all__, file an issue.

Internals — how to grow each surface

Adding a new report

Create src/kvestigate/reports/<name>.py with collect(snap) returning a dataclass + render(report, console) printing it.
If you need a new resource list, add a cached_property on Snapshot in reports/snapshot.py.

REPORTS = { ..., "<name>": (mymodule.collect, mymodule.render), ... }

It now shows up in kvestigate report list, is runnable via kvestigate report run <name>, and is part of kvestigate audit.

Adding a new planning strategy

Add a function in src/kvestigate/planning/solver.py that orders candidate nodes the way you want; reuse _consume / _node_score_after / _node_headroom as helpers.
Register it in STRATEGIES (or branch in _solve_general).
Pass it via --strategy your-name.

Adding a new planning constraint

Add a field to PlacementRequest in planning/request.py.
Apply the filter in planning/capacity.eligible_nodes.
Surface it as a CLI flag in cli.py:plan.

Adding a new diagnostic

Add a function in src/kvestigate/debug.py returning a structured dataclass + a render helper.
Wire it as @debug_app.command("name") in cli.py.

conclusion

happy kvestigating

Metric	Min	Max	Mean	Median	Total
Humor	0	6	0.48	0.0	63
Helpfulness	1	9	7.58	8.0	985
Aggression	0	3	0.09	0.0	12
Spiciness	0	4	0.10	0.0	13