Skip to content

Ophamin — coverage baseline + performance benchmarks

Status: Stage 1 (internal hardening) per docs/ELEVATION_ROADMAP_2026_05_16.md. Phase S2 (coverage) + S3 (benchmarks) artefacts live here; Phase S1 (mypy strict) lives in docs/MYPY_STRICT_BASELINE.md.

Baselines pinned 2026-05-16 at v0.5.0. Subsequent measurements compare against these to detect regression.


1. Coverage baseline (Phase S2)

Measured on Python 3.14, pytest 7+ with pytest-cov 7.1, branch coverage enabled. Configuration lives in .coveragerc; the canonical run command is:

.venv/bin/python -m pytest -q --cov=src/ophamin --cov-branch --cov-report=term-missing

Aggregate

Metric Value
Total lines instrumented 13,671
Total branches 3,674
Tests passing 1,148
Tests skipped 1
Combined coverage 75.4 % on CI (Ubuntu Python 3.12+3.13), ~77.8 % locally

The 2.4 pp gap between local and CI is real and honest: the author's macOS venv has additional optional deps installed from earlier sessions (NPEET / pacmap / etc.) that add reachable code paths. CI is the authoritative cross-platform measurement. Pre-push gate and CI gate are aligned at ≥ 75 % (slightly below the measured floor to absorb noise).

Coverage targets

Slice Current (CI) Target (v0.9.0) Stretch
Whole framework 75 % CI / 77 % local ≥ 80 % ≥ 85 %
measuring/ (scenarios + pillars + proof + codec) ~92 % ≥ 95 % 100 %
comparing/ (synthesis + regression-alert + drift) ~90 % ≥ 95 % 100 %
auditing/ ~85 % ≥ 92 % ≥ 95 %
reporting/ ~94 % ≥ 95 % 100 %
protocols.py 100 % 100 % 100 %
registry.py 96 % 100 % 100 %
campaign.py (Move F) high ≥ 95 % 100 %
seeing/discovery/ ~90 % ≥ 92 % ≥ 95 %
seeing/wiring/ 89 % ≥ 92 % ≥ 95 %
seeing/substrate/kimera_adapter.py 71 % (post-0.8.3) ≥ 70 % ✅ ≥ 80 %
seeing/corpus/connectors.py 54 % ≥ 65 % ≥ 75 %
verify.py 86 % ≥ 90 % ≥ 95 %

Per-wheel detail (current → 2026-05-16 baseline)

At or above target (no action needed):

File Coverage Notes
protocols.py 100 % Protocol declarations only
seeing/discovery/schema_diff.py 100 % Pure functional
seeing/__init__.py, corpus/__init__.py 100 % Tiny re-export files
reporting/__init__.py 100 % Re-export only
measuring/scenarios/substrate_completeness.py 97.7 % One unreachable line
latex_renderer.py 97.4 % One unrendered control-path
reporting/markdown_renderer.py 96.4 % Same shape
registry.py 96.2 % Two lines guard built-in-substrate import failure
quantum_basis_correlation.py 92.6 % Empirical-deep scenario; some defensive paths uncovered
seeing/telemetry/prometheus_probe.py 91.8 % Network-bound paths use the local-test fallback
seeing/substrate/field_catalog.py 90.3 % Drift-detection paths in field-shape inferrer

Below target — action items for the v0.9.0 ratchet:

File Coverage Gap Plan
seeing/corpus/connectors.py 54.2 % Real-corpus access paths gated on downloads Add mock-filesystem unit tests for parser branches; skip on missing data via pytest.skip
seeing/discovery/watcher.py 50.4 % Continuous-loop / mining path (lines 141-171) constructs a KimeraAdapter inline and calls a SchemaMiner; needs a real Kimera repo. Same integration-test territory as kimera_adapter. Phase S2 added 7 tests for the static helpers, run_forever loop with monkeypatched sleep, and kimera_head_commit failure paths. Accept the gap; mining-path coverage comes from owner-side runs against the live Kimera tree.
measuring/timeseries_helpers.py 50.7 % Optional helper paths used only by 2 scenarios Either dedicated property-test coverage or move to a helpers/optional/ subpackage with a skip-when-unused convention
measuring/scenarios/throughput_ceiling.py 71.9 % InstrumentedSubstrate wrapping paths Mock-substrate test that walks the wrapping ladder

Closed in 0.8.3 (Phase A4):

File Was Now Closure
seeing/substrate/kimera_adapter.py 55.9 % 71.1 % Added tests/test_kimera_adapter_subprocess_mock.py — 18 subprocess-mocked tests covering _invoke (every parse / decode / timeout branch), _to_cycle_result (success / adapter_error / cycle_seconds propagation / non-dict raw), and run_batch (subprocess-mode delegation + batch-mode happy path). Past the v0.9.0 ≥ 70 % target without a real Kimera repo. The full-suite measurement that combines the new tests with existing happy-path coverage measures the file at ~80 % in-file.

CI gate

Both pre-push (.githooks/pre-push) and GitHub Actions CI gate at ≥ 75 % as of 0.8.1. The gate was lowered from 77 → 75 with full rationale in CHANGELOG.md § 0.8.1: clean Ubuntu CI measures the framework at 75.4 % (the honest cross- platform floor) while the author's macOS dev box reaches 77.7 % because of NPEET / pacmap / other optional deps installed from earlier sessions that add reachable code paths. CI is the authoritative cross-platform measurement. Pre-push gate and CI gate are aligned at the same threshold so a clean local pre-push implies CI will pass.

.venv/bin/python -m pytest -q --cov=src/ophamin --cov-branch \
    --cov-fail-under=75

Ratchet plan: raise the gate to 80 (target) and then 85 (stretch) as the action-item table above closes.


2. Performance benchmarks (Phase S3)

Status: pending. The benchmark suite lives at tests/bench/ (to be created) and runs via:

.venv/bin/python -m pytest tests/bench/ -q --benchmark-only \
    --benchmark-storage=./bench_storage --benchmark-save=baseline

Pinned baselines populate the table below as benches land. CI gate: any single bench regressing > 20 % vs the pinned baseline fails the run.

Scope of bench coverage (target)

Layer Bench Why it matters
Codec dump → load → verify_signature round-trip on a 1KB proof Per-record write/read cost — load-bearing for ophamin proof list on a 1000-proof corpus
Codec EmpiricalProofRecord.sign(key) HMAC cost Should be sub-millisecond per record
Codec validate_schema() JSON-Schema check Should be ≤ 5 ms per record
Pillar SPC IndividualsChart.evaluate on N=10⁴ samples Per-cycle telemetry path
Pillar SPRT.update() per-observation cost Sequential-test critical-path
Pillar MixedLM.fit() on N=10³, groups=10 Heaviest single-pillar fit
Pillar River ADWIN per-observation update Streaming-drift path
Synthesis summarize_directory(<100 proofs>) Used by ophamin summarize
Registry Cold-start cost: from ophamin.measuring import pillars (fires all 11 adapters' registration) First-CLI-invocation latency
CLI time ophamin --version cold-start The smallest meaningful CLI surface
CLI time ophamin scenario list cold-start The discovery surface
CLI time ophamin proof verify <signed.json> Single-record verify

Baseline numbers — baseline_v0_5_0 pinned 2026-05-16

Measured on macOS Apple-Silicon, Python 3.14, pytest-benchmark 5.2.3, warmup on, 100-iteration micro-benches (where applicable). Numbers are median wall time across the benchmark's iteration count.

Bench Median Mean StdDev
river_adwin_update_per_observation 125 ns 136 ns 32 ns
sprt_update_per_observation (Gaussian) 190 ns 195 ns 48 ns
cma_add_50 (50 sequential add()) 2,028 µs 2,064 µs 150 µs
proof_sign_hmac_only 58.4 µs 59.6 µs 7.6 µs
proof_verify_signature 123 µs 128 µs 20 µs
proof_dump_load_verify_round_trip 304 µs 314 µs 43 µs
proof_validate_schema (jsonschema) 895 µs 923 µs 73 µs
compute_regression_alert_n50_pair 7.11 ms 7.19 ms 0.38 ms
list_proofs_n100 12.78 ms 12.85 ms 0.40 ms
build_index_n100 13.37 ms 13.45 ms 0.48 ms
summarize_directory_n100 19.67 ms 19.76 ms 0.62 ms
spc_individuals_chart_n10000 (fit + evaluate) 60.2 ms 60.2 ms 1.46 ms

Baseline saved at bench_storage/baseline_v0_5_0/. Re-run + compare via:

.venv/bin/python -m pytest tests/bench/ -q --benchmark-only \
    --benchmark-storage=./bench_storage \
    --benchmark-compare=baseline_v0_5_0 \
    --benchmark-compare-fail=mean:20%

CLI cold-start (separate from pytest-benchmark; measured via time)

To be populated as L1 (docs site) lands and provides a stable measurement environment. Until then, the pytest-benchmark numbers above are the canonical performance reference.

Observations

  • Per-observation update costs (River ADWIN, SPRT) are sub-microsecond — the streaming pillars can handle ≥ 5M observations/s.
  • HMAC signing is ~60µs — a 1000-proof corpus signs in ~60ms.
  • The 60ms SPC bench dominates the bench suite by an order of magnitude; expected for an N=10⁴ fit.
  • list_proofs and build_index are within 5% of each other, confirming build_index is essentially list_proofs + aggregation with negligible aggregation cost.

3. Mypy strict baseline (Phase S1)

See docs/MYPY_STRICT_BASELINE.md for the current error count + the per-layer remediation plan. Summary:

  • Strict configuration in pyproject.toml [tool.mypy].
  • Baseline error count + per-file breakdown captured.
  • Remediation: lowest-hanging-fruit layer (protocols + registry + measuring/proof) first; CLI + complex adapters last.
  • CI gate: mypy strict must pass on every PR once v0.6.0 lands.

4. Tracking conventions

  • Update this doc whenever a coverage gap moves by more than 1 % or a bench baseline shifts by more than 5 %.
  • Cite the measurement command in the doc above so the reader can re-run.
  • Pin numbers to a git commit hash when possible (e.g. "baseline at <short-sha>, 2026-05-16").

Authored by Claude (Opus 4.7 1M context), 2026-05-16, pinning the Phase S2 + Phase S3 baselines for the v0.5.0 → v0.6.0 transition.