HyperDjango Telemetry¶

The hyperdjango.telemetry package gives you native Prometheus metrics + OpenTelemetry-style distributed tracing in one module, backed by a lock-free Zig ring buffer and a runtime-dynamic metric registry. Zero cost when disabled; ≤3% overhead at production sampling rates; no protobuf anywhere.

Why a unified telemetry layer?
Quick start (5 lines)
Architecture
Enabling + configuring
Metrics — Counter, Gauge, Histogram
Tracing — Tracer, Span, context
Sampling policies
Sinks — Prometheus / Stdout / InMemory / custom
Middleware + background drain
W3C trace-context propagation
Tests — assertions API
Performance
Example: bookstore_api
FAQ

Why a unified telemetry layer?¶

Most Python frameworks split observability into two packages: one for Prometheus (counters + histograms) and one for OpenTelemetry (spans + traces). Both end up doing the same work — allocating a Python object per event, taking a lock to update state, then serializing. HyperDjango pushes both into a single Zig-backed pipeline:

Counter / Gauge / Histogram increments go through atomic RMW instructions on pre-registered native handles. ~107 ns per inc under ReleaseFast.
Spans claim a slot in a 16384-entry MPSC ring buffer via a single CAS on the slot state. Slot layout is compile-asserted to 384 bytes (64B header + 64B name + 128B attrs + 128B events). ~107 ns per span_start
end cycle.
Drain to Prometheus text + OpenTelemetry JSON dicts happens in a dedicated background thread at a configurable interval (default 1 s) — never on the request path.

One import, one enable(), one middleware, one sink list.

Quick start (5 lines)¶

from hyperdjango import HyperApp
from hyperdjango.telemetry import configure_from_settings

app = HyperApp()
telemetry = configure_from_settings(app)
if telemetry is not None and telemetry.prometheus_sink is not None:
    app.get("/metrics")(telemetry.prometheus_sink.handler)

Then set HYPER_TELEMETRY_ENABLED=1 in your environment and hit curl http://localhost:8000/metrics. You'll see per-request HTTP counters, DB query histograms, and anything else the platform instruments automatically.

Architecture¶

┌───────────────────────────────┐
│ Application code              │
│   tracer.start_span("work")   │
│   counter.inc()               │
│   histogram.observe(0.037)    │
└──────────┬────────────────────┘
           │
┌──────────▼─────────────────┐   ┌─────────────────────────┐
│ hyperdjango.telemetry      │   │ TelemetryMiddleware     │
│   Counter / Histogram ─────┼──►│   per-request span      │
│   Tracer / Span            │   │   HTTP metric emit      │
└──────────┬─────────────────┘   │   W3C propagation       │
           │                     └──────────┬──────────────┘
           │ FFI                            │
┌──────────▼─────────────────┐               │ every 1s
│ zig/src/metrics_py.zig     │               ▼
│   atomic counter registry  │   ┌─────────────────────────┐
│   span_ring.zig (16384)    │◄──┤ _DrainWorker (daemon)   │
│   UTF-8 safe truncation    │   │   _span_drain() →       │
└──────────┬─────────────────┘   │   list[dict]            │
           │                     └──────────┬──────────────┘
           │ Prometheus text     │ span batch + metrics text
           └──────────┬──────────┴────────┐
                      ▼                   ▼
           ┌───────────────┐   ┌──────────────────┐
           │ PrometheusSink│   │ StdoutSink /     │
           │   /metrics    │   │ InMemorySink /   │
           │   handler     │   │ user adapter     │
           └───────────────┘   └──────────────────┘

Enabling + configuring¶

Every telemetry knob is a first-class HyperDjango setting in hyperdjango.conf, NOT a hardcoded env-var-only flag. They flow through the same 4-tier resolution that every other framework setting uses:

Django settings.py  →  HYPER_*  env var  →  .env file  →  DEFAULTS
   (highest)                                                (lowest)

So you can set them via whichever layer fits your deployment:

Source	When to use
`DEFAULTS` dict	Framework default — shipped value, never edit by hand
`.env` file	Local dev / CI — checked in or per-environment overlay
`HYPER_*` env var	Container deployments (k8s, ECS, Heroku, systemd unit)
Django `settings.py`	Apps using the Django integration — `TELEMETRY_ENABLED = True`
`patch.dict(DEFAULTS)`	Tests — `unittest.mock.patch.dict` for fixture setup

The full setting list (also visible in hyperdjango.conf.SETTING_DEFINITIONS):

Setting	Default	Type	Range / choices
`TELEMETRY_ENABLED`	`False`	bool	master switch
`TELEMETRY_SERVICE_NAME`	`hyperdjango`	str	tracer name
`TELEMETRY_SAMPLE_RATIO`	`0.01`	float	0.0 - 1.0
`TELEMETRY_DRAIN_INTERVAL`	`1.0`	float	0.01 - 300.0 seconds
`TELEMETRY_EXTRACT_TRACEPARENT`	`True`	bool	honor inbound W3C header
`TELEMETRY_SINKS`	`["prometheus"]`	list[str]	prometheus / stdout / memory
`TELEMETRY_SPAN_RING_CAPACITY`	`16384`	int	power of 2, 256 - 16777216
`TELEMETRY_AUTO_LOG_CORRELATION`	`True`	bool	auto-inject trace_id in logs

configure_from_settings() reads ALL of these via hyperdjango.conf.get_setting(), which means whichever layer you choose to set them in, the rest of the framework sees the same value. There is no separate "telemetry env var subsystem".

1. Bootstrap from settings (recommended)¶

from hyperdjango import HyperApp
from hyperdjango.telemetry import configure_from_settings

app = HyperApp()
telemetry = configure_from_settings(app)
if telemetry is not None and telemetry.prometheus_sink is not None:
    app.get("/metrics")(telemetry.prometheus_sink.handler)

Then set the values from any layer you want:

# Option A: Django integration
# settings.py
TELEMETRY_ENABLED = True
TELEMETRY_SAMPLE_RATIO = 0.05
TELEMETRY_SPAN_RING_CAPACITY = 65536

# Option B: env vars / .env file
HYPER_TELEMETRY_ENABLED=1
HYPER_TELEMETRY_SAMPLE_RATIO=0.05
HYPER_TELEMETRY_SPAN_RING_CAPACITY=65536

# Option C: tests
from unittest.mock import patch
from hyperdjango.conf import DEFAULTS
with patch.dict(DEFAULTS, {"TELEMETRY_ENABLED": True, ...}):
    bootstrap = configure_from_settings(app)

configure_from_settings auto-wires app.use(middleware) and app.on_shutdown(middleware.shutdown), and returns a TelemetryBootstrap dataclass holding the sinks so you can mount their handlers.

2. Programmatic (full control, no settings)¶

from hyperdjango.telemetry import (
    Tracer, TelemetryMiddleware, PrometheusSink, StdoutSink,
    enable, ParentBased, RatioSample,
)

enable()
tracer = Tracer("myapp", sampler=ParentBased(root=RatioSample(0.05)))
prom = PrometheusSink()
middleware = TelemetryMiddleware(
    tracer=tracer,
    sinks=[prom, StdoutSink()],
    drain_interval_seconds=0.5,
)
app.use(middleware)
app.on_shutdown(middleware.shutdown)
app.get("/metrics")(prom.handler)

3. Tests (enable + InMemorySink)¶

from hyperdjango.telemetry import (
    InMemorySink, TelemetryMiddleware, Tracer, AlwaysSample, enable,
)

enable()
sink = InMemorySink()
tracer = Tracer("t", sampler=AlwaysSample())
mw = TelemetryMiddleware(tracer=tracer, sinks=[sink])
app.use(mw)
# ...drive test requests...
mw.drain_now()
assert len(sink.spans) >= 1

Metrics — Counter, Gauge, Histogram¶

Four classes cover the full Prometheus type surface:

from hyperdjango.telemetry import Counter, CounterVec, Gauge, Histogram, HistogramVec

# Non-labeled counter — one atomic u64 per process
requests = Counter("myapp_requests_total", "Total requests")
requests.inc()          # +1
requests.inc(5)         # +5

# Labeled counter (CounterVec) — one series per label combo
http = CounterVec(
    "myapp_http_total",
    "HTTP requests by method/status",
    label_names=("method", "status"),
)
http.inc({"method": "GET", "status": "200"})
http.inc_tuple(("POST", "500"))  # fast-path when labels are already ordered

# Gauge — goes up and down
in_flight = Gauge("myapp_in_flight", "In-flight requests")
in_flight.inc()
in_flight.dec()
in_flight.set(42)

# Histogram — default buckets tuned for request durations (ms-scale)
duration = Histogram("myapp_request_duration_seconds", "Request duration")
duration.observe(0.037)

Zero cost when disabled: every method starts with a _enabled bool check that short-circuits to a return. One LOAD_GLOBAL + branch — ~20-30 ns when telemetry is off.

Thread-safe: increments use atomic RMW on the native side, so you can call them from any thread (including under free-threading) without locks.

Label cardinality: keep the product of possible label values below ~1000. Each unique combination creates a new time series in memory. Prometheus best-practice applies: never use unbounded values (user IDs, request IDs, URLs with query strings) as labels.

Tracing — Tracer, Span, context¶

from hyperdjango.telemetry import Tracer, STATUS_OK, STATUS_ERROR

tracer = Tracer("myapp")

# Sync context manager
with tracer.start_span("compute_recommendations") as span:
    span.set_attr("user_id", user.id)
    span.set_attr("batch_size", 100)
    result = heavy_work()
    # span.end() is called automatically on exit

# Async context manager
async with tracer.start_span_async("fetch_remote") as span:
    span.set_attr("url", url)
    response = await client.get(url)
    span.set_attr("http.status_code", response.status)

# Decorator (detects async vs sync automatically)
@tracer.trace("list_books")
async def list_books(request):
    ...

@tracer.trace()  # defaults to fn.__qualname__
def compute_totals():
    ...

Every span started inside another span inherits the parent's trace_id via contextvars.ContextVar. No manual plumbing is required — nested spans in async handlers, asyncio.gather, and child threads all see the correct parent automatically.

Status codes¶

STATUS_UNSET = 0    # default
STATUS_OK    = 1
STATUS_ERROR = 2

An exception raised inside start_span() auto-sets status=ERROR with error.type + error.message attributes before re-raising.

Span attribute model — what fits, what doesn't¶

Each span slot in the native ring has a 128-byte packed buffer for ALL attributes. Each attribute is stored as [key_len][val_len][key_bytes][val_bytes], so a typical http.method=GET (12 bytes) leaves room for ~10 short attributes. This is sized for the OpenTelemetry semantic conventions (short key/value pairs identifying the request, user, tenant, etc.) — not for arbitrary blobs.

Attribute kind	Fits in 128B?	Where it belongs
`http.method`, `http.status_code`	yes	span.set_attr
`user.id`, `tenant`, `region`	yes	span.set_attr
`db.statement` (short table query)	usually	span.set_attr
`http.user_agent` (browser UA string)	usually	span.set_attr
Stack trace, full SQL query, JSON body	no	logger w/ trace correlation
Long error message + context	no	logger w/ trace correlation
Binary blob, file upload metadata	no	logger w/ trace correlation

Overflow behavior: when set_attr() would exceed the 128-byte budget, it silently drops the attribute. This is by design — span recording must never throw or block the request path. The dropped attributes are visible at drain time via the attrs_used field on the slot (currently exposed only to internal tooling).

Fast-path methods: when you know the value type at call time, use the typed fast-paths instead of the generic set_attr. They skip the 4-branch isinstance ladder that set_attr uses to dispatch str / int / float / bool / bytes. On heavily-instrumented spans (10+ attrs per request) this saves ~40 isinstance calls per request.

span.set_attr_str("http.method", request.method)      # known str
span.set_attr_int("http.status_code", response.status) # known int
span.set_attr_float("db.query_ms", elapsed_ms)         # known float
span.set_attr_bool("cache.hit", True)                  # encodes as "true"

span.set_attr("feature.flag", some_user_value)         # polymorphic, use generic

_attach_http_attrs in TelemetryMiddleware uses the fast-paths for the HTTP semantic attributes, which is where most of the recorded-span overhead lives. User code is free to mix the generic and fast-path forms — they produce identical on-the-wire output.

Long debug payloads — use the logger¶

For anything that won't fit in 128 bytes, use hyperdjango.logging.logger with trace correlation. Logs flow through the full structured-logging stack (ConsoleSink, JsonSink, FileSink, AsyncSink, etc.) and the JSON sink auto-promotes trace_id / span_id / trace_flags to top-level fields so log aggregators can join logs to traces with zero extra mapping config.

Every log emission inside an active span is automatically correlated — configure_from_settings() installs a global logger.patch() that reads the active SpanContext and injects trace_id / span_id / trace_flags into the record's extra dict at emission time. You don't need to wrap the logger explicitly:

from hyperdjango.logging import logger
# TelemetryMiddleware has wrapped the request → a span is active.
logger.error(
    "payment gateway returned non-2xx",
    status=response.status,
    body=response.text[:5000],
    headers=dict(response.headers),
)
# JSON sink output now carries {"trace_id": "...", "span_id": "...", ...}

The auto-correlation patcher is enabled by default whenever TELEMETRY_ENABLED=True. To opt out (e.g., if you're composing your own correlation logic), set TELEMETRY_AUTO_LOG_CORRELATION=False in settings. The patcher:

is a no-op when no span is active (safe at module-load time)
uses first-write-wins for collisions — your own bind() / contextualize() values are never overwritten
chains with any existing core.patcher — the user patcher runs first, the auto-correlator runs last

The older bind_trace_context(logger) helper is still available and is still the right choice for code paths that run before configure_from_settings() (e.g., early boot) or for explicit one-off correlation outside a traced request:

from hyperdjango.telemetry import bind_trace_context
log = bind_trace_context(logger)
log.warning("something happened outside a request")

The JSON sink in hyperdjango/logging/_sinks.py auto-promotes trace_id / span_id / trace_flags to top-level fields, so log aggregators join logs to traces by trace_id with zero extra mapping config.

Span events¶

HyperDjango supports span events — timestamped sub-events packed into a 128-byte per-slot arena. Each event has a name and a nanosecond timestamp captured at call time:

with tracer.start_span("process_order") as span:
    span.add_event("payment_started")
    result = await charge_card(order)
    span.add_event("payment_completed")
    await send_confirmation_email(order)
    span.add_event("email_queued")

Events appear in the drained span dict under "events" as a list matching the OpenTelemetry JSON event schema:

{
  "events": [
    { "name": "payment_started", "time_unix_nano": 1700000000001000000 },
    { "name": "payment_completed", "time_unix_nano": 1700000000003000000 },
    { "name": "email_queued", "time_unix_nano": 1700000000004000000 }
  ]
}

Arena budget: each event uses 9 + len(name) bytes (8-byte timestamp + 1-byte name_len + name). The 128-byte arena holds 4 events with 22-char names, or up to 13 events with 1-char names. Overflow drops silently (same discipline as attributes — span recording must never throw on the request path).

When to use events vs child spans vs logs:

Scenario	Use
Short timestamp marker ("cache miss", "retry")	`span.add_event("cache_miss")`
Operation with its own duration + attributes	child span via `tracer.start_span("sub_work")`
Long debug payload (stack trace, request body)	`logger.error(...)` with auto-correlation

Events have zero Python-side overhead when the span is a NoopSpan (unsampled path) — add_event returns immediately.

Sampling policies¶

from hyperdjango.telemetry import AlwaysSample, NeverSample, RatioSample, ParentBased

AlwaysSample()                 # record every span (development)
NeverSample()                  # zero span records (but trace_id still propagates!)
RatioSample(0.01)              # deterministic 1% head sampling via trace_id hash
ParentBased(root=RatioSample(0.05))  # inherit parent decision; fall through to root

ParentBased is the recommended production default — it guarantees that all spans in a single trace are sampled consistently, so you never end up with orphan child spans in the UI.

RatioSample uses the low 32 bits of trace_id as the hash input, so the decision is deterministic for a given trace (same trace → same decision on every node in the system).

Sinks — Prometheus / Stdout / InMemory / custom¶

Sinks implement the TelemetrySink Protocol:

@runtime_checkable
class TelemetrySink(Protocol):
    def export_metrics(self, prometheus_text: bytes) -> None: ...
    def export_spans(self, spans: list[dict]) -> None: ...
    def flush(self) -> None: ...
    def close(self) -> None: ...

Four built-ins:

Sink	Purpose	Span export	Metric export
`PrometheusSink`	Pull-based `/metrics` HTTP handler	No-op	Caches last scrape
`StdoutSink`	JSON lines + fenced Prometheus blocks	Yes	Yes
`InMemorySink`	In-process ring for tests	Yes	Yes (history)
User adapter	OTLP-compatible / custom backend	depends	depends

PrometheusSink¶

prom = PrometheusSink()
app.use(TelemetryMiddleware(sinks=[prom], ...))
app.get("/metrics")(prom.handler)

On every drain interval the middleware calls prom.export_metrics(text) with the latest exposition bytes. The HTTP handler serves those bytes directly — no per-scrape serialization cost. If no drain has happened yet, the handler falls back to computing the text live.

StdoutSink¶

sink = StdoutSink()               # default: sys.stdout
sink = StdoutSink(span_prefix="SPAN ")   # prepend each JSON line
sink = StdoutSink(include_metrics=False)  # suppress metrics, spans only

Each span is emitted as one JSON line. Metric scrapes are framed with # HYPER_METRICS_BEGIN / # HYPER_METRICS_END markers so log aggregators can trivially extract them.

InMemorySink¶

The canonical test sink. Thread-safe, FIFO-bounded, with read properties that return snapshot copies:

sink = InMemorySink(max_spans=10_000, max_metric_scrapes=64)
# ...run test...
mw.drain_now()

assert len(sink.spans) == 3
assert sink.spans[0]["name"] == "GET /books"
assert b"hyperdjango_http_requests_total" in sink.latest_metrics

Custom adapter¶

Implement four methods, pass it in:

class CustomSink:
    def export_metrics(self, prometheus_text: bytes) -> None:
        # push to your backend
        ...

    def export_spans(self, spans: list[dict]) -> None:
        # convert to your span format + POST
        ...

    def flush(self) -> None: ...
    def close(self) -> None: ...

app.use(TelemetryMiddleware(sinks=[CustomSink()]))

No inheritance required — the Protocol is runtime_checkable, so isinstance(sink, TelemetrySink) works for duck-typed sinks.

Middleware + background drain¶

TelemetryMiddleware does two things:

Wraps every request in a span — starts the span on entry, attaches HTTP attributes (http.method, http.route, http.status_code, net.peer.ip, http.user_agent), records exception status, propagates the active trace-context via outbound traceparent header on success.
Runs a daemon thread — every drain_interval_seconds (default 1.0) it calls _span_drain() on the native ring and fans the spans + Prometheus scrape text out to every sink. Broken sinks don't starve healthy ones (errors are logged to stderr and isolated).

Per-request emission:

hyperdjango_http_requests_total{method="GET", status="200"}  1
hyperdjango_http_requests_total{method="POST", status="500"} 1
hyperdjango_http_request_duration_seconds_bucket{method="GET", le="0.005"}  7
hyperdjango_http_request_duration_seconds_bucket{method="GET", le="0.01"}   8
hyperdjango_db_queries_total                                42
hyperdjango_db_query_duration_seconds_bucket{le="0.001"}    38

Zero-cost when disabled: if is_enabled() returns False, the middleware is a pure passthrough — one bool check and await call_next(request).

Shutdown¶

Always register the shutdown hook so the drain thread exits cleanly and the final span batch is flushed:

middleware = TelemetryMiddleware(...)
app.use(middleware)
app.on_shutdown(middleware.shutdown)   # ← important

shutdown() is idempotent. It stops the drain loop, runs one final drain, then calls flush() + close() on every sink.

Periodic samplers — pushing external state into metrics¶

Some sources of truth live outside the Python state-update path — pg.zig pool counters owned by Zig under a mutex, async task queue depths, WebSocket connection counts, cache eviction tallies. These can't be bumped from a per-request hot path because the state is owned by a background thread or the native layer.

The register_sampler(fn) hook lets subsystems push a snapshot into Prometheus gauges once per drain tick:

from hyperdjango.telemetry import Gauge, register_sampler
from hyperdjango._hyperdjango_native import _db_pool_stats

_pool_waiters = Gauge(
    "myapp_pool_waiters",
    "Threads currently blocked waiting for a pool connection.",
)
_pool_in_use = Gauge(
    "myapp_pool_in_use_connections",
    "Currently-pinned pool connections.",
)


def _sample_pool_gauges() -> None:
    """Pull the latest pool stats into the Gauges.

    Called by the drain worker every TELEMETRY_DRAIN_INTERVAL
    seconds. Must be cheap — it runs on the drain thread, not on
    the request path.
    """
    stats = _db_pool_stats(pool_handle)
    _pool_waiters.set(int(stats.get("waiters", 0)))
    _pool_in_use.set(int(stats.get("in_use", 0)))


register_sampler(_sample_pool_gauges)

Contract:

Samplers are invoked once per drain tick (default every 1.0 s, tunable via TELEMETRY_DRAIN_INTERVAL).
Errors are isolated per sampler — one broken sampler never starves the others or crashes the drain thread. Exceptions are collected and printed to stderr with the offending function name, matching the sink-error channel.
Registration is idempotent by identity — the same function registered twice is deduped, so module-level singletons are the expected pattern.
Samplers must be cheap — they run on the drain thread before the Prometheus exposition is generated, so a slow sampler delays every scrape.

Reference implementation: _sample_pool_gauges in hyperdjango/database.py, which skips silently when no Database has been instantiated (so telemetry boot doesn't trigger DB pool creation).

W3C trace-context propagation¶

Inbound traceparent header is parsed + installed as the parent context before any local span starts. Outbound responses get a traceparent header pointing at the current span so downstream services can continue the trace.

from hyperdjango.telemetry import parse_traceparent, format_traceparent

# Parse an inbound header
ctx = parse_traceparent(request.headers.get("traceparent"))
if ctx is not None:
    print(f"Inbound trace: {ctx.trace_id_high:016x}{ctx.trace_id_low:016x}")

# Format for outbound
outbound = format_traceparent(current_span().context)
client.get(url, headers={"traceparent": outbound})

parse_traceparent() is strict: wrong version, invalid hex, all-zero reserved values, or malformed format all return None without raising. format_traceparent() always emits version 00 + lowercase hex per the W3C spec.

tracestate is supported via parse_tracestate() + format_tracestate() with the 32-entry + 256-byte-per-entry limits from the spec.

Tests — assertions API¶

TelemetryAssertions gives you fluent assertions over an InMemorySink buffer:

from hyperdjango.telemetry import InMemorySink, TelemetryAssertions

sink = InMemorySink()
# ...drive the app...
mw.drain_now()
asserts = TelemetryAssertions(sink)

asserts.assert_span_count(3)
asserts.assert_has_span("GET /api/books")
asserts.assert_span_attr("GET /api/books", "http.status_code", "200")
asserts.assert_span_attr_contains("db.query", "sql", "FROM books")
asserts.assert_span_status("POST /fail", STATUS_ERROR)
asserts.assert_no_error_spans()
asserts.assert_span_chain(["GET /books", "db.query", "db.format"])

asserts.assert_metric_present("hyperdjango_http_requests_total")
asserts.assert_metric_has_label(
    "hyperdjango_http_requests_total", "method", "GET",
)
asserts.assert_metric_label_value(
    "hyperdjango_http_requests_total",
    {"method": "GET", "status": "200"},
    expected=1.0,
)

Every assertion raises AssertionError with a readable diff.

Performance¶

Measured on M-class Apple Silicon, ReleaseFast.

Operation	Cost	Notes
`Counter.inc()` when disabled	~25 ns	branch check only
`Counter.inc()` enabled	~117 ns	atomic RMW via FFI
`CounterVec.inc_tuple()` enabled	~180 ns	label join + atomic RMW
`Histogram.observe()` enabled	~200 ns	bucket find + atomic RMW
`span_start()` + `end()` unsampled	~75 ns	flag check + FFI
`span_start()` + `end()` sampled	~107 ns	slot CAS + FFI
`span_start()` + 3 attrs + `end()`	~282 ns	slot + 3 attr copies
`span_drain(1000)` per call	~1 ms	background thread only
Middleware floor — unsampled	~4.25 μs	per-request middleware cost
Middleware floor — sampled_01	~5.25 μs	per-request middleware cost
Middleware floor — sampled_100	~6.95 μs	per-request middleware cost

At production request latencies the relative overhead shrinks proportionally:

Baseline (wall-clock)	Overhead %	Where applicable
100 μs (pure-CPU route)	5-7%	Benchmarks only
500 μs (fast cache hit)	1.0-1.4%	Static content, simple API
1 ms (typical endpoint)	0.5-0.7%	Most production routes
5 ms (complex endpoint)	0.10-0.14%	DB-heavy read endpoints
10 ms (slow endpoint)	0.05-0.07%	Multi-join writes

See scripts/bench_telemetry_overhead.py for the reproducible bench — it checks both a percentage threshold (10% on the artificial realistic shape) and an absolute ns floor (≤15 μs per request) to catch regressions without false-positiving on jitter-dominated microbench measurements.

Settings reference¶

Every telemetry knob is a first-class HyperDjango setting. The table below shows the canonical setting name (use this in settings.py, patch.dict(DEFAULTS, ...), and get_setting(...)) and the matching HYPER_* env-var alias the conf loader honors. Both pathways resolve to the same value via hyperdjango.conf.get_setting().

Setting (canonical)	Env-var alias	Default	Meaning
`TELEMETRY_ENABLED`	`HYPER_TELEMETRY_ENABLED`	`False`	Master switch. Zero cost when False.
`TELEMETRY_SERVICE_NAME`	`HYPER_TELEMETRY_SERVICE_NAME`	`hyperdjango`	Default tracer name
`TELEMETRY_SAMPLE_RATIO`	`HYPER_TELEMETRY_SAMPLE_RATIO`	`0.01`	Float 0.0 - 1.0, head sampling
`TELEMETRY_DRAIN_INTERVAL`	`HYPER_TELEMETRY_DRAIN_INTERVAL`	`1.0`	Seconds between background drain ticks
`TELEMETRY_EXTRACT_TRACEPARENT`	`HYPER_TELEMETRY_EXTRACT_TRACEPARENT`	`True`	Honor inbound W3C header
`TELEMETRY_SINKS`	`HYPER_TELEMETRY_SINKS`	`["prometheus"]`	List: any of `prometheus`, `stdout`, `memory`
`TELEMETRY_SPAN_RING_CAPACITY`	`HYPER_TELEMETRY_SPAN_RING_CAPACITY`	`16384`	Slots in native span ring (power of 2, 256 - 16M)
`TELEMETRY_AUTO_LOG_CORRELATION`	`HYPER_TELEMETRY_AUTO_LOG_CORRELATION`	`True`	Auto-inject `trace_id`/`span_id`/`trace_flags` into log records

Set them via whichever layer fits your deployment — Django settings.py, HYPER_* env vars, .env files, or patch.dict(DEFAULTS, ...) in tests. They're not "env-var-only" flags; the env var alias is one of four resolution sources.

Resolution order (highest to lowest priority):

Django settings.py (when using the Django integration)
HYPER_* environment variable
.env file in project root
Framework DEFAULTS

Tuning the span ring¶

The span ring is allocated once at startup with a fixed slot count. Each slot is 384 bytes, so:

Capacity	Memory	Use case
256	64 KB	edge / embedded / unit-test fixtures
1024	256 KB	low-traffic dev environments
4096	1 MB	small production apps
16384	6 MB	default — most production apps
65536	16 MB	high-throughput services (>100k spans/sec)
262144	96 MB	extreme burst headroom

The ring is filled by producers in round-robin order. When a slot that's still complete (not yet drained) is hit on the next wrap, the producer drops the span and increments dropped_count. So the right capacity is roughly:

capacity ≈ peak_spans_per_second × drain_interval × safety_factor (1.5-2.0)

For a service producing 10k spans/sec at the default 1-second drain interval, 16384 gives ~1.6 seconds of burst headroom — plenty.

Set the capacity:

Via env var: HYPER_TELEMETRY_SPAN_RING_CAPACITY=65536
Via Python: from hyperdjango._hyperdjango_native import _span_configure; _span_configure(65536) BEFORE the first span

Capacity must be a power of 2 (the slot index uses an AND mask instead of modulo). The configure call rejects non-powers-of-2 with ValueError. After the first span has been recorded the ring is locked — _span_configure raises RuntimeError because resizing a live ring would dangle in-flight handles.

Validation order guarantee: input is always validated FIRST, regardless of init state. So a typo'd setting (e.g. HYPER_TELEMETRY_SPAN_RING_CAPACITY=1023 — not a power of 2) always raises ValueError even if the ring is already operational, not the more confusing RuntimeError("already initialized").

Operational state:

_span_capacity() returns the live capacity (after successful init) OR the configured/intended capacity (before init OR after failed init).
_span_is_operational() returns True only if the ring is allocated and recording. False before first use AND after a failed init (e.g. OOM on a giant capacity). Producers fall back to dropping every span when False.
_span_dropped_count() returns the cumulative drop counter — alert on this in your monitoring if it grows unexpectedly.

Failed-init recovery: if the first allocation fails (OOM), a subsequent _span_configure(smaller_value) rolls back the init-attempted flag and lets the next span retry allocation. This is the only case where configure() is allowed after init has been attempted; the ring must be non-operational.

All settings also go through hyperdjango.conf.SETTING_DEFINITIONS — see test_settings.py for the validation rules.

Example: bookstore_api¶

A full working example lives in examples/bookstore_api/app.py. Key lines:

from hyperdjango.telemetry import (
    PrometheusSink, TelemetryMiddleware, Tracer, enable as _enable_telemetry,
)
from hyperdjango.telemetry.sampling import ParentBased, RatioSample

if os.environ.get("HYPER_TELEMETRY_DISABLE") != "1":
    _enable_telemetry()
    tracer = Tracer(
        "bookstore_api",
        sampler=ParentBased(root=RatioSample(1.0 if app.debug else 0.05)),
    )
    prom = PrometheusSink()
    telemetry = TelemetryMiddleware(
        tracer=tracer,
        sinks=[prom],
        drain_interval_seconds=1.0,
    )
    app.use(telemetry)
    app.on_shutdown(telemetry.shutdown)
    app.get("/metrics")(prom.handler)

Run the example, hit a few endpoints, then curl /metrics to see the full Prometheus exposition including the auto-emitted hyperdjango_http_* + hyperdjango_db_* series.

FAQ¶

Q: Does this replace PerformanceMiddleware? No. PerformanceMiddleware is a dashboard-focused debug tool (slow query log, N+1 detection, /debug/performance). TelemetryMiddleware is a production metrics + tracing pipeline. They coexist fine — bookstore_api uses both.

Q: Can I use protobuf / OTLP? You can ship spans to any upstream OTLP collector by writing a custom sink that wraps opentelemetry-exporter-otlp. The framework itself never touches protobuf — span batches are always list[dict] in OpenTelemetry JSON shape.

Q: Do I need a separate trace collector? No, but you can. Point any OTLP-compatible SDK at your custom sink. For dev / small-scale, StdoutSink + a log forwarder is often enough.

Q: What about trace sampling at multi-tenant scale? RatioSample is deterministic via the trace_id hash, so the same trace is sampled consistently across every service. For tenant-aware sampling, wrap ParentBased in your own policy that inspects current() and varies the ratio per tenant.

Q: Is telemetry safe under free-threading (3.14t)? Yes. Every native operation uses atomic RMW instructions. ContextVar is per-thread / per-task. The span ring uses CAS for slot state transitions. We run a 8-thread × 1000-iter fuzz suite under free-threading in test_span_ring_fuzz.py.

Q: How much memory does the span ring use? 16384 slots × 384 bytes = 6 MB, allocated once at module load. Each slot is a fixed-size extern struct — no per-span allocations during normal operation. When the ring fills under sustained load, dropped_count increments and oldest spans are overwritten (never memory exhaustion).

Production deployment — Prometheus end-to-end¶

1. Enable telemetry in your app¶

# settings.py (Django integration)
TELEMETRY_ENABLED = True
TELEMETRY_SERVICE_NAME = "myapp"
TELEMETRY_SAMPLE_RATIO = 0.05          # 5% head sampling
TELEMETRY_SINKS = ["prometheus"]
TELEMETRY_SPAN_RING_CAPACITY = 32768   # 8 MB, for high-throughput services

Or via env vars (systemd, Docker, k8s):

HYPER_TELEMETRY_ENABLED=1
HYPER_TELEMETRY_SERVICE_NAME=myapp
HYPER_TELEMETRY_SAMPLE_RATIO=0.05
HYPER_TELEMETRY_SINKS=prometheus
HYPER_TELEMETRY_SPAN_RING_CAPACITY=32768

2. Prometheus scrape config¶

Add to prometheus.yml:

scrape_configs:
  - job_name: "hyperdjango"
    scrape_interval: 15s
    metrics_path: /metrics
    static_configs:
      - targets:
          - "myapp.internal:8000"
        labels:
          service: myapp
          environment: production
    # If behind a load balancer, use service discovery instead:
    # dns_sd_configs:
    #   - names: ["_http._tcp.myapp.service.consul"]

3. Key series to monitor¶

The telemetry stack emits these series automatically when TELEMETRY_ENABLED=True:

Series	Type	Labels	What it measures
`hyperdjango_http_requests_total`	CounterVec	method, status	Request rate by HTTP method + status code
`hyperdjango_http_request_duration_seconds`	HistogramVec	method	Request latency distribution (p50/p95/p99)
`hyperdjango_db_queries_total`	Counter	—	Total DB queries across all endpoints
`hyperdjango_db_query_duration_seconds`	Histogram	—	DB query latency distribution
`hyperdjango_rate_limit_hits_total`	CounterVec	backend	Rate-limit denials (RateLimitMiddleware)
`hyperdjango_csrf_violations_total`	CounterVec	reason	CSRF token failures (missing/mismatch)
`hyperdjango_session_auth_total`	CounterVec	result	Session auth outcomes (ok/no_cookie/invalid/...)
`hyperdjango_guard_denials_total`	CounterVec	reason	HyperGuard access denials by reason
`hyperdjango_template_renders_total`	Counter	—	Template render count
`hyperdjango_template_render_duration_seconds`	Histogram	—	Template render latency
`hyperdjango_dataloader_loads_total`	CounterVec	result	DataLoader hit/miss rate
`hyperdjango_dataloader_batch_size`	Histogram	—	DataLoader batch size distribution
`hyperdjango_admin_actions_total`	CounterVec	model, action	Admin write actions (add/change/delete)
`hyperdjango_pool_total_connections`	Gauge	—	Configured pool size
`hyperdjango_pool_in_use_connections`	Gauge	—	Currently-pinned connections
`hyperdjango_pool_waiters`	Gauge	—	Threads blocked waiting for connections

4. Dashboard panel queries (PromQL)¶

HTTP RPS by status (top panel):

rate(hyperdjango_http_requests_total{service="myapp"}[5m])

Legend: {{method}} {{status}}

Request latency p50/p95/p99:

histogram_quantile(0.50, rate(hyperdjango_http_request_duration_seconds_bucket{service="myapp"}[5m]))
histogram_quantile(0.95, rate(hyperdjango_http_request_duration_seconds_bucket{service="myapp"}[5m]))
histogram_quantile(0.99, rate(hyperdjango_http_request_duration_seconds_bucket{service="myapp"}[5m]))

DB query rate:

rate(hyperdjango_db_queries_total{service="myapp"}[5m])

Pool saturation (early warning):

hyperdjango_pool_waiters{service="myapp"}
hyperdjango_pool_in_use_connections{service="myapp"} / hyperdjango_pool_total_connections{service="myapp"}

Error rate:

sum(rate(hyperdjango_http_requests_total{service="myapp", status=~"5.."}[5m]))
  /
sum(rate(hyperdjango_http_requests_total{service="myapp"}[5m]))

5. Alerting rules¶

# prometheus_rules.yml
groups:
  - name: hyperdjango
    rules:
      # 5xx error rate > 5% over 5 minutes
      - alert: HighErrorRate
        expr: |
          sum(rate(hyperdjango_http_requests_total{status=~"5.."}[5m]))
          / sum(rate(hyperdjango_http_requests_total[5m]))
          > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High 5xx error rate ({{ $value | humanizePercentage }})"

      # p99 latency > 2 seconds
      - alert: HighLatency
        expr: |
          histogram_quantile(0.99, rate(hyperdjango_http_request_duration_seconds_bucket[5m]))
          > 2.0
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "p99 latency above 2s ({{ $value | humanizeDuration }})"

      # Pool saturation — threads waiting for connections
      - alert: PoolSaturation
        expr: hyperdjango_pool_waiters > 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "{{ $value }} threads waiting for DB pool connections"

      # Rate limiting firing — potential abuse or misconfiguration
      - alert: RateLimitSpike
        expr: rate(hyperdjango_rate_limit_hits_total[5m]) > 10
        for: 5m
        labels:
          severity: info
        annotations:
          summary: "Rate limiter triggering at {{ $value | humanize }}/s"

6. systemd unit example¶

# /etc/systemd/system/myapp.service
[Unit]
Description=MyApp (HyperDjango)
After=postgresql.service

[Service]
User=myapp
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/.venv/bin/uv run hyper start --app app:app
Environment=HYPER_TELEMETRY_ENABLED=1
Environment=HYPER_TELEMETRY_SERVICE_NAME=myapp
Environment=HYPER_TELEMETRY_SAMPLE_RATIO=0.05
Environment=DATABASE_URL=postgres://myapp:secret@localhost/myapp
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

7. Kubernetes ConfigMap + Deployment¶

# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
data:
  HYPER_TELEMETRY_ENABLED: "1"
  HYPER_TELEMETRY_SERVICE_NAME: "myapp"
  HYPER_TELEMETRY_SAMPLE_RATIO: "0.05"
  HYPER_TELEMETRY_SPAN_RING_CAPACITY: "32768"
---
# deployment.yaml (relevant snippet)
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8000"
        prometheus.io/path: "/metrics"
    spec:
      containers:
        - name: myapp
          envFrom:
            - configMapRef:
                name: myapp-config
          ports:
            - containerPort: 8000
              name: http

8. OTLP span export (optional)¶

For distributed tracing via any OTLP/HTTP backend, add the example OTLP sink from examples/otlp_sink/:

from examples.otlp_sink import OTLPSpanSink

sink = OTLPSpanSink(service_name="myapp")
# Or configure via env vars:
#   OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.internal:4318
#   OTEL_SERVICE_NAME=myapp

telemetry = configure_from_settings(app)
telemetry.middleware._worker.sinks.append(sink)

This sends span batches to an OTLP/HTTP collector in JSON format (no protobuf dependency). Pair with PrometheusSink for metrics and OTLPSpanSink for traces — both sinks coexist on the same middleware with zero conflict.

See also: docs/TelemetryArchitecturePlan.md for the original design doc and scripts/test_e2e_telemetry.py for the end-to-end integration test that exercises every feature described above.