Kvestigate - Investigate Kubernetes Clusters
Ever find yourself given access to a piece of garbage kubernetes cluster and you need need to audit it to figure out what the hell is going on?
Say hello to kvestigate your kubernetes investigation and auditing toolkit with the ability to even run ad-hoc workloads beyond the reach of the kubernetes scheduler allocator substrate system by just doing shit on the hosts themselves like computers are supposed to do.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Language Files Lines Code Comments Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Python 141 34192 28551 1381 4260
TOML 3 72 57 8 7
YAML 23 1059 718 257 84
─────────────────────────────────────────────────────────────────────────────────
Markdown 18 3983 0 3083 900
|- BASH 13 500 288 141 71
|- Python 6 177 132 27 18
|- YAML 3 218 187 11 20
(Total) 4878 607 3262 1009
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total 185 40201 29933 4908 5360
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Features
kvestigate
is a read-first Kubernetes toolkit for investigating clusters,
running workloads directly on host machines (no Docker), and safely
handing a constrained slice of your cluster resources to AI
agents.
Three surfaces on one foundation (the official
kubernetes client + a privileged node-access
DaemonSet):
- Investigate & plan — audit a cluster, stream logs, exec into pods, and bin-pack a workload onto available GPU/CPU. Read-only by default.
- Run things host-native — create isolated workspaces on a node’s host OS (tmpfs / disk / hybrid storage, GPUs bound by UUID), sync code in, and launch processes. No container build, no pod scheduling.
- Host constrained cyberspaces — expose an allow-listed, quota’d, audited slice of those workspace capabilities over HTTP/MCP, so agents and other programs can “upload and run things” without cluster-wide power.
Bonus Feature
Natively integrates with nvidia-smi to query all nodes
for cluster-wide gpu usage and power metrics to see how much of the
world you are wasting with your “ai” “workloads”
node running SM% VRAM Usage Power Usage
node-01 2/8 25.0 174.4/764.7 GiB (22.8%) 0.49 kW (10.3%) 35°C active saturated
node-02 2/8 12.5 157.2/764.7 GiB (20.6%) 0.59 kW (12.4%) 59°C active saturated
node-03 8/8 24.8 617.3/764.7 GiB (80.7%) 1.16 kW (24.2%) 71°C active saturated
node-04 8/8 12.5 701.9/764.7 GiB (91.8%) 1.22 kW (25.3%) 79°C active saturated
node-05 8/8 12.5 690.9/764.7 GiB (90.3%) 1.01 kW (21.1%) 43°C active saturated
node-06 8/8 12.5 695.5/764.7 GiB (90.9%) 1.06 kW (22.0%) 68°C active saturated
node-07 8/8 12.5 691.4/764.7 GiB (90.4%) 1.00 kW (20.9%) 49°C active saturated
node-08 4/8 0.0 357.0/764.7 GiB (46.7%) 0.46 kW ( 9.6%) 38°C ghost reserved-idlecluster aggregate (only over reporting GPUs)
GPUs: 48 allocated / 64 capacity (stated 75.0%) reporting=64 GPU(s) on 8 node(s)
compute: mean SM util 14.0% (idle headroom: ~86.0%)
memory: 3.99/ 5.97 TiB resident ( 66.8% of fleet VRAM, 33.2% free)
power: 7.00 kW (18.2%) drawn of 38.40 kW TDP (thermal/power headroom: ~81.8%)
activity tally (per GPU): idle=15 reserved-idle=40 saturated=9
power per GPU (W): min=29 max=596 mean=109 median=85 stddev=113 (N=64)
temp per GPU (°C): min=25 max=79 mean=37 median=35 stddev=11 (N=64)
power split by activity: idle=456 W (6.5%) reserved-idle=3.44 kW (49.1%) saturated=3.10 kW (44.3%)
wasted power: 2.22 kW across 40 reserved-idle GPU(s) (observed idle baseline = 30 W (mean of 15 truly-idle GPU(s)); total reserved-idle draw = 3.44 kW)
verdicts: ghost=1 active=7 no-allocation=0 no-samples=0
drain-readiness: idle (safe to drain)=0 reserved-idle (loaded but no traffic; consolidate)=1 reserved-active (mem held + activity; coordinate)=0
reserved-idle nodes (model loaded, no compute/mem-bw/power signal — strong consolidate/downsize candidates):
node-08 stated=4 mem=357.0/764.7 GiB (46.7%) pods: prd-r6000-nest/model-multi-2-otel-llm-1, prd-r6000-nest/model-multi-3-otel-llm-0,
prd-r6000-nest/model-multi-3-otel-llm-8, prd-r6000-nest/model-multi-4-otel-llm-1
→ recoverable capacity: 4 stated GPU(s), 357.0/ 764.7 GiB of held VRAM with zero observed work
ghost nodes (single-shot — re-probe with --samples N to confirm):
node-08 stated=4 pods=4 mean=0.0% max=0.0% < threshold 5.0%
energy forecast (straight-line projection of current draw — assumes the fleet keeps doing exactly what it's doing now)
TDP ceiling: month= 28.05 MWh quarter= 84.15 MWh year=336.61 MWh (every GPU at 100 % TDP, theoretical max)
observed draw: month= 5.11 MWh quarter= 15.34 MWh year= 61.36 MWh (18.2% of TDP, 44.3% productive)
productive: month= 2.27 MWh quarter= 6.80 MWh year= 27.21 MWh (LIGHT + ACTIVE + SATURATED — real compute)
idle baseline: month= 0.33 MWh quarter= 1.00 MWh year= 4.01 MWh (IDLE class — unavoidable driver-on floor)
reserved-idle: month= 2.51 MWh quarter= 7.54 MWh year= 30.15 MWh (total draw on loaded-but-no-traffic GPUs)
→ recoverable: month= 1.62 MWh quarter= 4.87 MWh year= 19.46 MWh (reserved-idle draw minus observed baseline — saved if you consolidate; 31.7% of total observed)Use cases
- Answer “what’s going on in this cluster right now?” without mutating anything.
- Plan whether a multi-GPU job fits, and where, before launching it.
- Give an ML workload a 500 GiB RAM-backed scratch space on a bare host, bypassing container/scheduler overhead.
- Let an autonomous agent provision + run + fetch results from disposable workspaces, fenced in by policy and quotas it can’t exceed.
Surfaces at a glance
| Command | What it does |
|---|---|
kvestigate audit |
Run every report (full cluster picture) |
kvestigate report list |
Show available individual reports |
kvestigate report run <name> |
Run one report by name |
kvestigate namespaces |
List namespaces + pod counts |
kvestigate pods [-n ns] |
List pods, optionally namespaced |
kvestigate describe <kind> <name> [-n ns] |
Inspect a pod / service / deployment / node |
kvestigate logs <pod> [-n ns] [-c container] [-f] |
Stream pod logs |
kvestigate exec <pod> [-n ns] -- cmd args |
Run a command inside a pod |
kvestigate plan ... |
Bin-pack a workload onto the cluster (capacity planning) |
kvestigate workspace status / gpus |
Discover workspaces / per-GPU availability |
kvestigate workspace create / destroy / sync |
Workspace lifecycle (writes gated) |
kvestigate workspace run / ps / logs / kill |
Host-process runs inside a workspace |
kvestigate workspace diff / apply / export |
IaC verbs over workspace.yaml |
kvestigate kubeconfig analyze / verify |
Diagnose a kubeconfig (read-only) |
kvestigate kubeconfig convert / diff |
Apply strategies; semantic diff |
kvestigate cybernet init / serve / mcp-server |
Generate cyberspace config; run HTTP / MCP frontend |
kvestigate cybernet emit-helper / status / call |
Generate agent helper script; health check; one-shot call |
kvestigate cybernet hash-token |
sha256-hash a bearer token for the config |
kvestigate fanout targets ... |
Resolve which nodes/pods a fan-out would target (no execution) |
kvestigate fanout exec ... -- cmd |
Run a command on every selected node/pod in parallel |
kvestigate fanout fetch ... PATH DIR |
Pull a file from every target into a local directory |
kvestigate fanout push LOCAL REMOTE |
(write) push a local file to every target |
kvestigate debug context |
Show resolved kubeconfig + how auth is being sent |
kvestigate debug api [path] |
GET a path through the python client (exercises auth) |
kvestigate debug exec-url <pod> |
Show HTTP/WS URLs for exec + attempt handshake |
kvestigate debug timing |
Time representative API calls — diagnose slowness |
kvestigate deploy <manifest> |
(write) apply a YAML manifest |
kvestigate delete <kind> <name> |
(write) delete a resource |
kvestigate node cordon / uncordon <node> |
(write) mark a node un/schedulable (reversible) |
kvestigate node drain <node> |
(write) cordon + evict a node’s pods (clear for new work) |
kvestigate node clear <node> |
(write) evict a node’s pods, leave it schedulable |
kvestigate evict --node/-n/--selector |
(write) evict a scoped set of pods (reclaim capacity) |
kvestigate scale <deploy> <replicas> [-n ns] |
(write) scale a Deployment (normal, or quiesce with 0) |
kvestigate optimize analyze / plan |
Diagnose GPU fragmentation; propose a consolidation plan |
kvestigate optimize probe-gpu |
Fan-out nvidia-smi; flag ghost allocations (stated vs
actual) |
kvestigate optimize apply |
(write) execute a consolidation plan via cordon + evict |
kvestigate app render / plan |
Compile an App spec to workspace or k8s artifacts (dry-run) |
kvestigate app diff |
Show app-level spec_hash drift + planned backend diff (read) |
kvestigate app up / down |
(write) apply / tear down an App; auto-picks host vs pod |
kvestigate app status / list |
Read recorded apply state (spec_hash, phase) — local cache only |
Audit and reports
Full audit (default for “what’s going on”)
kvestigate audit
kvestigate audit --json snapshots/2026-05-20.json # also save a snapshotRuns all reports in a sensible order:
- nodes — capacity & requested per node, plus cluster totals. Shows CPU, memory, GPU, pod count. Uses allocations (what kube scheduler sees), not live usage.
- metrics — live CPU/memory from
metrics.k8s.io(metrics-server). Includes top 15 pods by CPU and by memory. Skipped with a notice if metrics-server is unavailable. - gpu — per-node GPU model, count, memory, driver, CUDA, MIG state, allocation; per-container consumer table.
- workloads — every Deployment / StatefulSet / DaemonSet with ready vs desired and image.
- network — Services (with endpoint counts so you can spot broken selectors), Ingresses, NetworkPolicies.
- storage — PersistentVolumeClaims, PersistentVolumes, per-StorageClass bound capacity.
Single report
kvestigate report list
kvestigate report run nodes
kvestigate report run gpu --json snapshots/gpu-now.jsonAll reports share a single Snapshot so the same API
calls aren’t repeated. JSON output is structured + carries a UTC
captured_at, so two snapshots taken hours apart can be
diffed by anything that reads JSON.
Inspection
Namespaces & pods
kvestigate namespaces
kvestigate pods # all namespaces
kvestigate pods -n your-namespace # one namespaceDescribe a resource
kvestigate describe pod my-app-abc123-xyz -n your-namespace
kvestigate describe service my-app -n your-namespace
kvestigate describe deployment my-app -n your-namespace
kvestigate describe node node-04describe node <name> is the source of truth for
“what hardware is this really” — full label dump (NVIDIA labels in their
own table, everything else separately), capacity, allocatable, taints,
addresses, kernel/runtime/kubelet versions. Use this whenever the
summary tables in report run gpu truncate something you
care about.
Logs
kvestigate logs my-app-abc123-xyz -n your-namespace
kvestigate logs my-app-abc123-xyz -n your-namespace -f --tail 500
kvestigate logs some-pod -n some-ns -c sidecar # specific containerExec
# Read-only inspection inside a pod (this is fine on prod; do not write):
kvestigate exec my-app-abc123-xyz -n your-namespace -- ls /
kvestigate exec model-server-... -n your-namespace -- nvidia-smiDefault transport is the local kubectl binary (set up to
use the project kubeconfig). The in-process websocket exec via the
Python client is also implemented but currently has a URL-parsing bug
through the API proxy — use --transport ws only when you’re
debugging that path. See
kvestigate debug exec-url <pod> for the diagnostic
surface.
Capacity planning —
kvestigate plan
Bin-packs a workload onto the cluster’s currently-free capacity (allocatable minus current pod requests). Three strategies always run side by side so the operator can see trade-offs:
- bestfit — tightest pack, minimizes fragmentation.
- spread — maximum HA distribution.
- pack — fewest nodes used; consolidates load.
Semantics
--gpu / --cpu / --memory are per
replica. Multiply by --replicas to get total
demand. This matches PodSpec semantics — so “12 GPUs total, spreadable
across nodes” is --gpu 1 --replicas 12, while “single pod
with 12 GPUs” is --gpu 12 --replicas 1 (which today’s
cluster cannot host — max 8 per node).
Memory accepts 64Gi, 1Ti, etc. CPU accepts
8, 8000m, 0.5, etc.
Examples
# 12 single-GPU replicas, spreadable
kvestigate plan --gpu 1 --replicas 12 --cpu 4 --memory 16Gi
# Tensor-parallel-ish: one pod that needs 12 GPUs (will fail clearly today)
kvestigate plan --gpu 12 --replicas 1
# 4 replicas × 2 GPUs each, HA spread (one replica per node)
kvestigate plan --gpu 2 --replicas 4 --cpu 16 --memory 64Gi --one-per-node
# All replicas must land on the same node (rare, but supported)
kvestigate plan --gpu 4 --replicas 1 --cpu 32 --memory 128Gi --single-node
# Restrict to Blackwell-family GPU nodes via node label
kvestigate plan --gpu 1 --replicas 4 \
--label nvidia.com/gpu.family=blackwell
# Skip specific nodes
kvestigate plan --gpu 1 --replicas 6 --exclude node-04 --exclude node-08
# Only run one strategy
kvestigate plan --gpu 1 --replicas 8 --strategy spread
# Save the plan output (free capacity + placements) as JSON
kvestigate plan --gpu 1 --replicas 12 --json snapshots/plan-12gpu.jsonFan-out —
kvestigate fanout
Run commands or transfer files across many nodes or pods in parallel.
Target model
A “target” is something to talk to. Three target kinds:
--target |
Selection flags | Pod used for exec | Command wrapper |
|---|---|---|---|
nodes |
--node N (repeatable) /
--exclude-node N |
the default/direct-node-access DS pod
on that node |
chroot /host … for commands; raw access via
/host/… for files |
pods |
--namespace, --selector,
--pod, --container |
the pod itself | none |
deployment |
--deployment, --namespace,
--container |
each pod of the Deployment | none |
The node-access path uses the existing privileged DaemonSet
(hostPID/hostIPC/hostNetwork: true, host /
mounted at /host, tolerates all taints), which means
no new infrastructure is created — no
kubectl debug node/… pod spawns, no privileged container
deploys.
Different DaemonSet on your cluster? Override with
--node-ds-namespace / --node-ds-name.
fanout targets
— preview without executing
kvestigate fanout targets --target nodes
kvestigate fanout targets --target pods -n your-namespace -l app=my-app
kvestigate fanout targets --target deployment --deployment my-app -n your-namespacefanout exec
— run a command everywhere
# nvidia-smi on every node
kvestigate fanout exec --target nodes -- nvidia-smi
# Kernel version per node (with a tighter timeout)
kvestigate fanout exec --target nodes --timeout 20 -- uname -r
# Subset of nodes
kvestigate fanout exec --target nodes --node node-04 --node node-05 -- df -h /
# Inside specific pods
kvestigate fanout exec --target pods -n your-namespace -l app=my-app -- ps -ef
# All pods of a deployment
kvestigate fanout exec --target deployment --deployment my-app -n your-namespace -- env
# Dry-run: print resolved per-target argv, do not execute
kvestigate fanout exec --target nodes --dry-run -- nvidia-smiPer-target progress streams immediately
(✓ node-08 exit=0 1.3s) so slow / stuck targets are visible
in real time. Add --full to disable output truncation;
--json PATH for structured results.
Worked
example: nvidia-smi — node view vs. pod view
nvidia-smi is the canonical case where the same
command tells you different things depending on where you run
it, so pick the target on purpose:
| Question | Run against |
|---|---|
| What’s actually running on every physical GPU in the cluster? | --target nodes |
| Are any GPUs idle / hot / throttling? | --target nodes |
| Which OS processes (from any namespace) are bound to which GPU? | --target nodes |
| What’s the NVLink / PCIe topology on a host? | --target nodes |
| Is my service using its allocated GPU correctly? | --target pods -l app=… or
--target deployment … |
| Is my container throttling? | --target pods … (hardware-side throttling:
--target nodes) |
Why the difference: a pod only sees the GPUs the
NVIDIA device plugin gave it, renumbered to local indices
starting at 0. So a pod with one allocated GPU always reports a single
“GPU 0” — it has no way to know it’s physically GPU 5 on
node-07. The host’s nvidia-smi sees every GPU
and every process touching it regardless of pod ownership. Pick the view
that matches the question.
Cluster-wide GPU snapshot, compact (recommended default)
kvestigate fanout exec --target nodes --full -- \
nvidia-smi \
--query-gpu=index,memory.used,memory.total,utilization.gpu,temperature.gpu,power.draw \
--format=csv,noheader,nounitsfanout fetch
— pull a file from every target
# Get /etc/hosts from every node into ./out/<node>/hosts
kvestigate fanout fetch /etc/hosts ./out/ --target nodes
# Get a directory tree
kvestigate fanout fetch /var/log/cloud-init.log ./logs/ --target nodes
# From a specific pod
kvestigate fanout fetch /app/config.yaml ./pod-configs/ \
--target pods -n your-namespace --pod my-app-…-xyzFor node targets, paths are absolute on the host (the tool
translates them to the /host/… mount internally — you write
/etc/hosts, not /host/etc/hosts).
fanout push
— write a file to every target (WRITE — gated)
# Dry run is always safe to use on prod
kvestigate fanout push ./hosts.allow /etc/hosts.allow --target nodes --dry-run
# Real push requires an explicit confirmation flag
kvestigate fanout push ./hosts.allow /etc/hosts.allow --target nodes \
--i-understand-this-writes-to-hostsWithout --i-understand-this-writes-to-hosts the command
refuses, even after resolving targets. This is the same safety posture
as deploy / delete: writes only with explicit
per-action authorization.
Common flags
| Flag | Meaning |
|---|---|
--max-parallel N |
Concurrent subprocesses (default 8) |
--timeout SECS |
Per-target wall-clock timeout (default 60) |
--dry-run |
Resolve targets and print plan, do not execute |
--full |
(exec only) print full output, no truncation |
--json PATH |
Write structured results |
--node-ds-namespace / --node-ds-name |
Override the node-access DaemonSet location |
Workspaces — host-native workloads without Docker
kvestigate workspace lets you run workloads directly on
a node’s host OS (via the privileged direct-node-access
DaemonSet) instead of through a K8s Deployment. Useful for ML research,
fine-tuning, dataset prep — anywhere container indirection isn’t earning
its keep and the host’s installed NVIDIA driver + rsync +
Python is all the platform you need.
A workspace is a directory on a chosen node with a
known subdir layout (code/, env/,
datasets/, checkpoints/, logs/,
pids/), optionally backed by tmpfs for speed. Cluster
prerequisites (already met here): driver 580+, rsync on the
host, host root mounted at /host inside the DS pod.
Storage modes
Pick one per workspace. The fast-cleanup property of
all-tmpfs is deliberate — when the workload’s output is
checkpointed elsewhere, destroy is one umount + one
rmdir, no file walk at all.
| Mode | What’s tmpfs-backed | Persistence across reboot | Cleanup |
|---|---|---|---|
disk |
nothing | everything | rm -rf <workspace> (file walk; slow for big
trees) |
hybrid (default) |
only datasets/ |
code, env, checkpoints, logs | umount <ws>/datasets then
rm -rf <ws> |
all-tmpfs |
the entire workspace root | none | umount <ws> + rmdir <ws> —
instant |
tmpfs_size defaults to 500G; on these 1.5
TiB nodes that leaves ~900 GiB+ free RAM after the mount.
Surface
kvestigate workspace status # discover workspaces on all nodes
kvestigate workspace gpus --node N # real GPU availability (nvidia-smi × k8s)
kvestigate workspace create --config foo.yaml [--apply]
kvestigate workspace destroy --config foo.yaml [--apply]
kvestigate workspace sync <local> --config foo.yaml [--apply] [--delete] [--exclude PAT]
kvestigate workspace run --config foo.yaml --tag T --gpu N [--apply] -- <cmd...>
kvestigate workspace ps --config foo.yaml
kvestigate workspace logs TAG --config foo.yaml [-f] [--tail N]
kvestigate workspace kill TAG --config foo.yaml [--signal TERM] --i-understand-this-modifies-hosts
kvestigate workspace diff --config foo.yaml
kvestigate workspace apply --config foo.yaml [--apply]
kvestigate workspace export NAME --node N -o foo.yaml
Every write subcommand is default
--dry-run and additionally requires
--i-understand-this-modifies-hosts to actually execute
(same posture as fanout push).
Infrastructure-as-code spec
A workspace is fully described by a YAML file.
jsonargparse does the loading; the schema is the
WorkspaceSpec dataclass tree.
# workspace.yaml — the durable, versionable form
workspace:
name: finetune-experiment
node: node-02
api_version: v1
root: /opt/kvestigate-workspaces
storage:
mode: all-tmpfs # disk | hybrid | all-tmpfs
tmpfs_size: 500G
# hybrid_tmpfs_path: datasets # only used in hybrid mode
gpu:
reserve: 4 # planner picks 4 currently-free GPUs
# indices: [0, 1, 2, 3] # OR pin specific physical indices
cordon_node: true # `kubectl cordon` while workspace is active
sources:
- local: ./my-code
subdir: code
excludes: ["__pycache__", "*.pyc", ".git"]
- local: ./small-fixtures
subdir: datasetsIaC workflow
# Read-only: what would apply do?
kvestigate workspace diff --config workspace.yaml
# Dry-run the whole reconciliation (create + every source rsync)
kvestigate workspace apply --config workspace.yaml
# Execute (requires the safety flag)
kvestigate workspace apply --config workspace.yaml --apply \
--i-understand-this-modifies-hosts
# Save a snapshot of an existing workspace back to YAML
kvestigate workspace export finetune-experiment --node node-02 \
-o snapshots/finetune-2026-05-20.yamlLaunching a run (no Docker, GPUs bound by UUID)
# Dry-run: shows the exact shell script that would execute on the host
kvestigate workspace run \
--config workspace.yaml \
--tag train-001 --gpu 0 --gpu 1 --gpu 2 --gpu 3 \
--env WANDB_DISABLED=true \
-- python train.py --batch-size 32 --epochs 10
# Real launch
kvestigate workspace run --config workspace.yaml --tag train-001 \
--gpu 0 --gpu 1 --gpu 2 --gpu 3 \
--apply --i-understand-this-modifies-hosts \
-- python train.py --batch-size 32
# Watch progress
kvestigate workspace ps --config workspace.yaml
kvestigate workspace logs train-001 --config workspace.yaml -f
# Stop it
kvestigate workspace kill train-001 --config workspace.yaml \
--signal TERM --i-understand-this-modifies-hostsGPU indices are resolved to UUIDs at launch time and bound via
CUDA_VISIBLE_DEVICES=GPU-uuid-1,GPU-uuid-2,… — survives any
device-plugin renumbering. Each run writes
logs/<tag>.log, pids/<tag>.pid,
and pids/<tag>.meta (cmd, gpus, started_at).
Cleanup story (the all-tmpfs fast path)
# all-tmpfs cleanup is a single umount + rmdir, no file walk:
kvestigate workspace destroy --config workspace.yaml # shows plan
kvestigate workspace destroy --config workspace.yaml --apply \
--i-understand-this-modifies-hosts
# [1] umount /opt/kvestigate-workspaces/finetune-experiment ← entire workspace gone
# [2] rmdir /opt/kvestigate-workspaces/finetune-experimentFor hybrid mode, destroy umounts the
datasets tmpfs and then does rm -rf for the small
persistent layout. For disk mode it’s a full file walk; use
all-tmpfs if cleanup speed matters.
GPU contention with K8s
K8s and the device plugin do not know about host processes
you launch. The default cordon_node: true calls
kubectl cordon on the workspace’s node when
apply runs and uncordons on destroy, so K8s
won’t schedule new GPU pods onto a node you’re using directly. Set
cordon_node: false if you’re sure no contention will happen
(e.g. on a dedicated research node) or want to manage that
separately.
What nvcc not in PATH
means
These hosts have the driver + runtime but no CUDA dev toolkit.
Pre-built PyTorch / JAX / TensorRT wheels work fine (they ship their own
CUDA libs). If your code calls nvcc directly (custom kernel
JIT compile), install the toolkit per workspace via uv into
env/ and adjust PATH in --env.
Cybernet — constrained server interface for other agents
kvestigate cybernet exposes a subset of
kvestigate over HTTP REST and/or MCP, gated by a per-deployment YAML
policy. The intended use case is “let other code agents launch
host-native workloads on a curated host set, without giving them the
keys to the cluster.”
Architecture
agent ──▶ HTTP/REST ─┐
agent ──▶ MCP stdio ─┼─▶ policy engine ──▶ kvestigate library
agent ──▶ HTTP/REST ─┘ │
▼
audit log (JSONL)
Two frontends share one policy engine. Adding a third (gRPC, websockets) costs nothing — the engine and registry don’t change.
Managed workflow — three commands instead of fifteen
The full lifecycle (token generation → config → server → agent helper → distributed job → cleanup) is wrapped by three high-level surfaces so operators don’t hand-write YAML and agents don’t hand-write orchestration.
Step 1 — operator:
cybernet init
kvestigate cybernet init \
--name research-pool \
--template distributed-cpu \
--listen 0.0.0.0:8765 \
--output research-pool.yamlGenerates a random bearer token, writes a cyberspace YAML from the
named template (distributed-cpu,
research-agents, observability, or
blank), prints the token once and the next-step
commands. The token’s sha256 hash is what lives in the YAML; the
plaintext is shown in the terminal output for the operator to hand to
the agent out-of-band.
Step 2 — operator:
cybernet serve
kvestigate cybernet serve --config research-pool.yamlSame as before — starts the HTTP REST frontend.
Step 3 — operator:
cybernet emit-helper
kvestigate cybernet emit-helper http://localhost:8765 \
--output agent_helper.pyReads /info from the running server and writes a Python
module tailored to that cyberspace: one function per allowed
operation, plus a new_job() factory if the cyberspace
permits the operations needed for the DistributedJob
pattern. The generated file is self-contained — hand it to the agent
along with the bearer token.
Step 4 — agent: import and run
import os
os.environ["KVESTIGATE_CYBERNET_TOKEN"] = "<token operator gave you>"
from agent_helper import new_job
with new_job(prefix="job-finetune", shards=5) as job:
job.discover_and_plan(cpu_per_shard=100, mem_per_shard="600Gi")
job.provision(tmpfs_per_shard="600Gi")
job.sync_code("./my-workload", excludes=["__pycache__", ".git"])
job.run(["sh", "-c", "python worker.py"])
job.wait_until_complete(poll_interval=10)
# Workspaces auto-destroyed on exit — instant when storage_mode is all-tmpfs.That’s the whole agent program. The DistributedJob
session:
- discovers candidate nodes via
report.run name=nodes - picks the top-N by free CPU among those meeting per-shard requirements
- creates one workspace per chosen node, rolling back on partial failure
- rsync-pushes code into each workspace’s
code/ - launches the command in each shard with
DIST_RANK/DIST_WORLD_SIZE/DIST_PEERS/DIST_COORDINATORenv vars injected automatically - polls
workspace.psto detect completion and tails rank-0 logs for progress - on context exit (success or exception) destroys every shard
Three job patterns: symmetric, parallel, heterogeneous
DistributedJob distinguishes three workload shapes. Pick
the matching method — the wrong one either runs the same code N times by
accident, or hand-builds boilerplate the session class would emit for
free.
1. Symmetric —
same command on every shard (job.run)
Right for distributed training and any workload where every rank runs
the same program and branches on DIST_RANK
/ DIST_WORLD_SIZE / DIST_PEERS /
DIST_COORDINATOR (all injected automatically).
job.run(["sh", "-c", "python train.py"])2.
Embarrassingly parallel — same binary, different args
(job.run_parallel)
Right for chunk-N-of-M-style workloads: every shard runs the same program but with a different argument list, no inter-shard coordination beyond initial dispatch.
job.run_parallel(
cmd_base=["python", "process.py"],
args_per_shard=[
["--input", "chunks/0.parquet"],
["--input", "chunks/1.parquet"],
["--input", "chunks/2.parquet"],
["--input", "chunks/3.parquet"],
["--input", "chunks/4.parquet"],
],
)3.
Heterogeneous — different command per shard
(job.run_tasks)
Right for coordinator/worker setups, multi-stage pipelines, parameter
servers, or any “rank 0 is special” workload. Each shard gets its own
ShardTask (cmd, env, tag).
from kvestigate_client import ShardTask
job.run_tasks([
ShardTask(
cmd=["python", "coordinator.py", "--world-size", "5"],
env={"ROLE": "coordinator"},
rank=0,
),
*[
ShardTask(
cmd=["python", "worker.py", "--worker-id", str(i)],
env={"ROLE": "worker"},
rank=i,
)
for i in range(1, 5)
],
])The rank field is optional — when omitted, list position
dictates which shard gets the task. Mismatched lengths and duplicate
ranks raise ValueError rather than silently
misbehaving.
Placement strategy — which node gets each shard?
discover_and_plan(..., strategy=...) controls how the
planner orders the eligible nodes after the host filter passes:
| Strategy | What it does | Use when |
|---|---|---|
"least-loaded" (default) |
Uses live metrics-server data for CPU + RAM; sorts ascending by
combined live load. Falls back to alloc-side
(cpu_requested/mem_requested) if
metrics-server is unavailable. |
You’re “double-using” a cluster where pods reserve more than they consume — pick the quietest nodes so you don’t disturb co-tenants. |
"most-free-capacity" |
Sort by free CPU (cpu_allocatable - cpu_requested)
descending. |
You want the biggest absolute slack regardless of relative load. Conservative; respects the K8s scheduler’s view. |
job.discover_and_plan(
cpu_per_shard=100,
mem_per_shard="600Gi",
gpu_per_shard=6,
strategy="least-loaded", # the default
)The host filter (min_free_cpu_cores /
min_free_memory / min_free_gpu) runs
before sorting — so nodes that can’t fit a single shard are
dropped regardless of strategy. The strategy only decides the order
within the viable set.
Per-shard data —
job.sync_per_shard
Pairs with the heterogeneous and embarrassingly-parallel patterns:
each shard needs its own input chunk. sync_code pushes the
same directory to every shard; sync_per_shard
pushes different directories per rank.
job.sync_code("./pipeline-code") # shared code → every shard
job.sync_per_shard({ # per-shard data → only listed ranks
1: "./inputs/chunk-0.jsonl",
2: "./inputs/chunk-1.jsonl",
3: "./inputs/chunk-2.jsonl",
4: "./inputs/chunk-3.jsonl",
}, subdir="datasets/input")Ranks not in the map are skipped — useful when rank 0 is a coordinator that doesn’t process data chunks itself.
See examples/agents/heterogeneous_pipeline.py for the
full coordinator-plus-workers walkthrough.
Complete workflow kits
For end-to-end recipes — operator config + agent script + worker code
+ test-data helpers, all in one directory — see examples/workflows/. Each
subdirectory is a runnable kit you can copy and modify:
| Kit | Pattern |
|---|---|
parallel-batch-processing/ |
Embarrassingly parallel — N workers each process a different input chunk |
coordinator-workers-http/ |
Heterogeneous coordinator + workers over HTTP |
readonly-monitoring/ |
Long-running polling agent against a read-only cyberspace |
gpu-single-node/ |
One workspace on one node with 6+ GPUs bound (single-shard multi-GPU) |
cpu-bulk-dataset-gen/ |
Distributed synthetic dataset generation using
least-loaded placement |
Each kit’s own README.md walks through operator setup →
agent invocation → expected output → how to extend → what’s deliberately
omitted.
Lower-level surfaces (still available)
For ad-hoc invocations or shell scripting, the older commands all work:
# Health check
kvestigate cybernet status http://localhost:8765
# One-shot operation invocation
kvestigate cybernet call workspace.gpus \
--url http://localhost:8765 \
--params '{"nodes": ["node-06"]}'
# From Python with no helper module:
# from kvestigate_client import CybernetClient, DistributedJob
# c = CybernetClient(base_url="...", token=os.environ["..."])
# c.call("workspace.create", {...})
# MCP frontend (for LLM-driven clients)
kvestigate cybernet mcp-server --config research-pool.yaml
# 6. For LLM-driven clients (Claude Desktop, etc.):
kvestigate cybernet mcp-server --config my-cyberspace.yaml
# (point the MCP client at this command's stdio)Cyberspace spec (the YAML shape)
apiVersion: kvestigate.cybernet/v1
kind: Cyberspace
metadata:
name: research-agents
description: "Constrained surface for ML research agents."
listen:
http: { address: "0.0.0.0:8765" }
mcp: { stdio: true } # both frontends are optional
auth:
tokens:
- identity: research-agent-1
sha256: "<sha256 of bearer token>" # see `kvestigate cybernet hash-token`
tags: [gpu, research]
operations:
# Default-deny. Listed ops reach the policy engine.
allow:
- report.run
- workspace.status
- workspace.gpus
- workspace.diff
- workspace.create
- workspace.sync
- workspace.run
- workspace.ps
- workspace.kill
- workspace.destroy
constraints:
hosts:
allow_names: ["node-06", "node-07", "node-08"]
require:
min_free_gpu: 4
min_free_cpu_cores: 80
min_free_memory: "800Gi"
workspace:
name_pattern: "^agent-[a-z0-9][a-z0-9-]{0,30}$"
storage:
allowed_modes: ["all-tmpfs", "hybrid"]
max_tmpfs_size: "500Gi"
gpu:
max_per_workspace: 4
cordon_node_forbidden: true
quotas:
max_active_workspaces_per_identity: 3
max_total_active_workspaces: 10
max_total_active_gpu: 32
max_total_active_memory: "5Ti"
audit:
path: "/var/log/cybernet/research-agents.jsonl"
include_denials: trueHTTP API surface
| Route | Auth | Purpose |
|---|---|---|
GET /info |
none | Public metadata — operations, host constraints, quota usage |
POST /api/v1/op/<name> |
bearer | Dispatch an allowed operation with JSON body params |
Error responses use a consistent envelope:
{"error": {"code": "...", "message": "...", "reason": "..."}}
with status codes 401 (unauth), 403 (op not in
allow-list), 422 (host or parameter constraint),
409 (quota), 500 (execution error).
Operation registry (the universe of namable operations)
Read operations: report.run,
workspace.status, workspace.gpus,
workspace.diff, workspace.ps.
Write operations: workspace.create,
workspace.destroy, workspace.sync,
workspace.run, workspace.kill.
Every operation is opt-in via operations.allow; nothing
else is reachable through cybernet even though kvestigate itself can do
more.
Audit log
Every request — allowed or denied — appends one JSONL record to
audit.path. Each record carries timestamp, identity, remote
address, operation, parameters (with
token/password/secret keys
masked), decision, reason, outcome, duration, and post-call quota
snapshot.
Growing cybernet
- New operation: add an
Operation(name, kind, description, handler)tocybernet/registry.py:_REGISTRY. The HTTP and MCP frontends pick it up automatically. List it in your cyberspace’soperations.allowto make it reachable. - New constraint dimension: add a field to the
relevant
*ConstraintsSpecdataclass incybernet/spec.py, handle it incybernet/policy.py:_check_workspace_params. Tests intests/cybernet/test_policy.pyshow the pattern. - New frontend: wrap the same
policy.authorize→registry.get(...).handlerflow used byserver.pyandmcp_adapter.py. The engine doesn’t care how the request arrived.
Node capacity
management — node / evict /
scale
The emergency + capacity surface: cordon a node, clear or drain it
for new work, evict a scoped set of pods, or scale a Deployment (normal
use, or quiesce to zero). The node subgroup is
cordon / uncordon / drain /
clear; evict and scale are
top-level.
The pod-removing commands (drain, clear,
evict) are the highest-blast-radius in kvestigate, so they
all:
- default to dry-run — they print the exact plan
(what would be evicted, what is skipped and why) and do nothing until
you pass
--applyand the matching confirmation flag; - use the eviction API, so PodDisruptionBudgets are honored (a PDB block is reported, not force-killed) and termination is graceful;
- protect infrastructure by default — DaemonSet,
mirror/static, already-terminating, and completed pods are never
evicted, and pods in system namespaces (
kube-system,kube-public,kube-node-lease) are skipped unless you pass--include-system.
They are operator-only: not exposed as cybernet
operations, and the policy engine hard-denies their names so an agent
can never reach them
(cybernet.policy.DANGEROUS_OPERATIONS).
node cordon
/ node uncordon — toggle schedulability
Reversible and non-evicting — cordon just stops
new pods from landing on a node; uncordon reverses
it. (These also exist as workspace cordon/uncordon for the
GPU-protection workflow; the node forms are the general
operator verbs.)
kvestigate node cordon gpu-node-7 --i-understand-this-modifies-cluster
kvestigate node uncordon gpu-node-7 --i-understand-this-modifies-clusternode drain
— clear a node for new workloads
Cordons the node (so nothing reschedules back) and evicts its evictable pods.
# Preview what draining gpu-node-7 would evict / skip (safe; no changes)
kvestigate node drain gpu-node-7
# Actually drain it (cordon + evict). Both flags are required.
kvestigate node drain gpu-node-7 --apply --i-understand-this-disrupts-workloads
# Also evict system-namespace pods (use with care), with a grace period
kvestigate node drain gpu-node-7 --include-system --grace-period 30 \
--apply --i-understand-this-disrupts-workloadsPairs naturally with the rest of the tool:
kvestigate plan to see what fits, node drain
to make room, then workspace create on the freed node.
node clear
— evict a node’s pods but leave it schedulable
Like drain, but it does not cordon —
use it to reclaim a node’s capacity while keeping it open for new
scheduling. Same protections and flags as drain.
# Preview
kvestigate node clear gpu-node-7
# Evict the node's pods, node stays schedulable
kvestigate node clear gpu-node-7 --apply --i-understand-this-disrupts-workloads(node clear gpu-node-7 is the node-scoped shorthand for
evict --node gpu-node-7.)
evict —
reclaim capacity by scope (no cordon)
Evict a set of pods chosen by node, namespace, and/or label selector.
Unlike drain, it does not cordon — use it
to reclaim capacity while leaving the node schedulable.
# Preview: every (non-system, non-infra) pod on a node
kvestigate evict --node gpu-node-7
# All pods in a namespace
kvestigate evict --namespace batch-jobs --apply --i-understand-this-disrupts-workloads
# By label selector, optionally scoped to one namespace
kvestigate evict -l app=stale-trainer -n research \
--apply --i-understand-this-disrupts-workloads
# Selector across the whole cluster
kvestigate evict --selector 'tier=preemptible' --apply --i-understand-this-disrupts-workloadsAt least one of --node / --namespace /
--selector is required.
scale
— resize a Deployment (normal use or quiesce)
# Preview a scale change (shows current → target)
kvestigate scale my-app 5 -n my-ns
# Apply it
kvestigate scale my-app 5 -n my-ns --apply --i-understand-this-modifies-cluster
# Quiesce to zero (stop the workload without deleting the Deployment)
kvestigate scale my-app 0 -n my-ns --apply --i-understand-this-modifies-cluster--namespace/-n defaults to the current
kubeconfig context’s namespace.
Cluster optimization —
optimize
A diagnose / plan / apply surface for consolidating thin GPU
workloads onto denser nodes so the cluster keeps fully-used
nodes alongside fully-empty ones for isolated runs and experiments.
Strictly operator-only — never exposed as a cybernet operation,
hard-denied in policy.DANGEROUS_OPERATIONS.
The pipeline:
audit --json— capture a cluster snapshot.optimize analyze --audit <path>— quantify fragmentation.optimize plan --audit <path>(with tunables) — get a concrete move list, donor nodes to drain, target nodes to densify.- (Optional)
optimize probe-gpu --audit <path>— fan-outnvidia-smiread-only and flag ghost allocations (stated GPUs without live work). optimize apply --audit <path> --apply --i-understand-this-disrupts-workloads— execute the plan via the existingnodeopscordon + wave-evict primitives (PodDisruptionBudgets respected, infra/system pods protected).
How the planner works
Best-fit-decreasing bin-pack with disjoint donor / target sets so pods never bounce through a node that’s being emptied:
- A donor is any node whose stated GPU utilization is
strictly below
--donor-max-util(default0.3). Donors get drained. - A target is any other partial node with free GPUs. Targets get denser.
- Pods on donors are sorted largest-first and placed onto the target with the smallest sufficient remaining capacity.
- A pod that won’t fit anywhere is reported as
unplaceable— never moved.
Tunables
| Flag | Default | What it does |
|---|---|---|
--donor-max-util |
0.3 |
A node below this stated util is a donor candidate. Raise it to be more aggressive about emptying nodes. |
--min-free-nodes |
1 |
Target number of empty nodes to keep available for isolated runs. Reported, not enforced. |
--exclude-namespace |
(none) | Repeatable. Pods in these namespaces are never moved. |
--exclude-node |
(none) | Repeatable. These nodes are never donors and never targets. |
--pin |
(none) | Repeatable namespace/pod-pattern rules (*
glob). Matching pods are never moved. |
--wave-size |
4 |
Pods evicted per wave during apply. |
--grace-period |
(none) | Seconds passed to the eviction subresource. |
--ghost-threshold |
0.05 |
(probe-gpu) Mean live GPU util below this flags the node’s allocation as a ghost. |
--samples |
1 |
(probe-gpu) Number of nvidia-smi rounds per GPU. Use 3–5 for confident ghost detection. |
--interval |
1.0 |
(probe-gpu) Seconds between sampling rounds when
--samples > 1. |
Data source:
--audit or --live
Every optimize subcommand accepts
either --audit <path> (a captured
kvestigate audit --json <path> file — reproducible,
shareable) or --live (query the cluster
right now via the configured kubeconfig). Both go through the same
reports.gpu collector, so a saved audit and a live read
produce identical typed views — only the time of capture differs.
# Against a saved snapshot (reproducible):
kvestigate optimize analyze --audit cluster.audit.json
# Against the cluster right now (same output shape):
kvestigate optimize analyze --liveWorked example (the snapshot in this repo)
cluster.audit.json is a real audit: 8 GPU nodes, 5 fully
packed (100%), 1 half-full (50%), 2 nearly empty (12%). 18 free GPUs are
stranded across 3 partial nodes — no single node has a contiguous block
bigger than 7.
# Diagnose:
kvestigate optimize analyze --audit cluster.audit.json # or --live
# Plan with a tunable:
kvestigate optimize plan --audit cluster.audit.json --min-free-nodes 2
# → 2 moves; drains node-01 and node-02; densifies node-08
# (50% → 75%); empty nodes 0 → 2.
# Cross-check stated vs actual GPU usage (read-only):
kvestigate optimize probe-gpu --live --ghost-threshold 0.05
# Execute (gated):
kvestigate optimize apply --live --min-free-nodes 2 \
--apply --i-understand-this-disrupts-workloadsOperator transparency in the output
analyze, plan, and probe-gpu
all render with no hidden math:
analyzelabels each metric with its definition (stranded = free GPUs on partial nodes,largest single-node free block, etc.) and prints the consolidation-opportunity rule explicitly (total_free > largest_block AND partial > 1).planprints every tunable in effect, the donor/target candidacy rule, the moves, the unplaceable list with reasons, and a projected per-node allocation table with the explicit deltas per node.probe-gpuis fully detailed: it prints the threshold, the verdict rule (mean_util_pct = sum(sample.util)/len(samples);ghostifmean / 100 < threshold), and for every probed node the per-GPU samples (index / util% / memory MiB), the pods accounted on the node, the computed mean, and the verdict. The summary at the bottom lists ghost / active / no-allocation / no-samples counts.
Ghost allocations (windowed nvidia-smi probe)
The k8s scheduler accounts pods at their request level: a
pod that asked for 8 GPUs but only really uses 1 is still “8 allocated”.
optimize probe-gpu fans out nvidia-smi
(read-only) and lets you see, per GPU, both compute utilization
and memory used — so you can distinguish:
- truly empty (0% util, 0 MiB) — safe consolidation target;
- model warm but idle at probe instant (0% util, ~tens of GB memory loaded) — the model is hot, the workload just isn’t currently serving;
- actively running (non-zero util).
A single sample can mislead you if a workload bursts between requests. The probe supports a sampling window so the verdict is based on multiple nvidia-smi reads spaced over time:
# Windowed: 5 samples per GPU spaced 2s apart (~10s total wall-clock).
kvestigate optimize probe-gpu --live --samples 5 --interval 2What the windowed output gives you on each GPU:
- per-GPU mean / max / min util%, and mean memory used across the window;
- a bursty tag when a GPU went idle and active during
the window (
max - mean > 20pp) — that’s exactly the workload you’d misjudge with a single sample; - node-level verdict still based on mean across all readings — but you
can see
node_maxalongside it, so a node with a brief 100% burst is visibly not a ghost even if the mean is low.
A “confident ghost” in this scheme is one where
every sample on every GPU is below
threshold across the window — the summary line explicitly says
confident (N=… over ~Xs) so the operator can see the signal
strength behind the call.
Apps — kvestigate app
A single declarative YAML kind (kvestigate App) that
compiles deterministically to either a host-native
workspace or a small set of Kubernetes manifests,
chosen by an explicit target: field. No templating engine,
no in-cluster controller — the spec is a target format the user
generates with whatever upstream tool, then
kvestigate app up compiles + applies through the existing
cybernet audit/quota/policy gate.
Full design in docs/APP_SPEC.md; summary
here.
Commands
kvestigate app render <spec.yaml> # print compiled artifacts (yaml | json)
kvestigate app plan <spec.yaml> # dry-run plan (no side effects)
kvestigate app up <spec.yaml> --apply --i-understand-this-modifies-cluster
kvestigate app down <spec.yaml> --apply --i-understand-this-modifies-cluster
kvestigate app diff <spec.yaml> # spec_hash drift + backend diff
kvestigate app status <spec.yaml> # last-recorded apply state for this App
kvestigate app list # every App with a persisted statusAll commands default to dry-run; --apply plus the safety
flag is required to mutate (same pattern as
workspace create / optimize apply).
A minimal App spec
app:
metadata: { name: trainer-01, namespace: research }
target: auto # workspace | k8s | auto
command: [python, -m, train]
env: { OMP_NUM_THREADS: "16" }
lifecycle: interactive # service | batch | interactive
resources: { gpu: 2, memory: 64Gi }
workspace: # used when target resolves to workspace
node: gpu-node-7
storage: { mode: hybrid, tmpfs_size: 500G }
sources: [{ local: ./code, subdir: code }]That spec compiles to a WorkspaceSpec +
LaunchPlan (host-native path) because the
target: auto policy sees an explicit
workspace.node opt-in and at least one
supporting signal — any one of {GPU ≥ 1,
lifecycle: batch | interactive, tmpfs ≥ 64 GiB} is
enough.
target: —
host or pod, and how auto decides
| Field value | Behavior |
|---|---|
workspace |
Always compiles to a host-native workspace (via
lifecycle.create + workspace run). |
k8s |
Always compiles to a Deployment (+ Service when
k8s.ports is set). |
auto |
Runs the policy below. Falls back to k8s when nothing
decides. |
The auto policy (see app/target.py):
- Refuse on hard caveats (e.g.
storage.mode=all-tmpfs+restart: alwaysis a contradiction — node reboot wipes the workspace). - Workspace requires explicit opt-in:
workspace.nodemust be set. Without that, the workspace block defaults aren’t operator intent and should not promote to workspace. - With
workspace.nodeset, any one of{big tmpfs ≥ 64 GiB, GPU ≥ 1, lifecycle: batch | interactive}flips to workspace. - Otherwise:
k8s— the safe default until Roadmap A1 (cgroup-capped runs) lands; pods get cgroups via the kubelet for free.
Using AppSpec from Python
Everything load / compile / status / diff related is importable from
kvestigate.app. The shape mirrors the CLI exactly:
from pathlib import Path
from kvestigate.app import (
# Specs + enums (closed-domain types are all StrEnums)
AppSpec, Target, Lifecycle, RestartPolicy, WorkloadKind, ExposeMode, AppPhase,
# Load / dump (YAML <-> dataclass)
load_app, load_app_friendly, dump_app, AppLoadError, resolve_shared_config_files,
# Compile (pure-functional; no I/O)
compile_to_workspace, compile_to_k8s, compile_to_distributed, spec_hash,
# Target policy
pick_target, ResolvedTarget,
# Status persistence + drift
write_status, read_status, status_path_for, APP_STATUS_DIR, RecordedStatus,
diff_app, AppDiff, DriftStatus,
# Presentation
format_manifest_summary, format_apply_results,
)
# Run from the repo root, or pass an absolute path here:
spec = load_app(Path("examples/apps/web-service.yaml"))
manifests = compile_to_k8s(spec) # list[dict] ready for kubectl apply
drift = diff_app(spec, status_path_for(spec.metadata.namespace, spec.metadata.name))Submodule layout is an implementation detail for production callers,
with one exception: tests that need to redirect status persistence
(monkeypatch.setattr("kvestigate.app.status.APP_STATUS_DIR", tmp_path))
must reference the submodule directly. If you need a name that isn’t in
__all__, file an issue.
Internals — how to grow each surface
Adding a new report
Create
src/kvestigate/reports/<name>.pywithcollect(snap)returning a dataclass +render(report, console)printing it.If you need a new resource list, add a
cached_propertyonSnapshotinreports/snapshot.py.Register the report in
reports/__init__.py:REPORTS = { ..., "<name>": (mymodule.collect, mymodule.render), ... }It now shows up in
kvestigate report list, is runnable viakvestigate report run <name>, and is part ofkvestigate audit.
Adding a new planning strategy
- Add a function in
src/kvestigate/planning/solver.pythat orders candidate nodes the way you want; reuse_consume/_node_score_after/_node_headroomas helpers. - Register it in
STRATEGIES(or branch in_solve_general). - Pass it via
--strategy your-name.
Adding a new planning constraint
- Add a field to
PlacementRequestinplanning/request.py. - Apply the filter in
planning/capacity.eligible_nodes. - Surface it as a CLI flag in
cli.py:plan.
Adding a new diagnostic
- Add a function in
src/kvestigate/debug.pyreturning a structured dataclass + a render helper. - Wire it as
@debug_app.command("name")incli.py.
conclusion
happy kvestigating