Public API reference¶

Auto-generated from the source docstrings via mkdocstrings. Every public symbol listed below is part of Ophamin's stable surface; breaking changes follow the semver promise.

Top-level¶

ophamin ¶

Ophamin — an empirical observatory around Kimera-SWM.

The name is the angelic order Ophanim (wheels-within-wheels covered with eyes — Ezekiel 1:18). Architecturally, Ophamin is a dyson sphere around Kimera: Kimera sits at the centre emitting; Ophamin envelops, senses, and returns measurement to the operator. Not a tool next to Kimera — a structure around it.

The structure has six wheels, in two concentric triads:

Outer (empirical) triad:
  seeing       Wheel 1 — how the observatory senses Kimera and the world
               (substrate, corpus, discovery)
  measuring    Wheel 2 — pre-registered measurement engines + plug-in pillars
               (proof, scenarios, metrics, pillars.{observability, adaptive,
               effects, synthesis, robustness, diagnostics})
  comparing    Wheel 3 — cross-Kimera-commit retrospection
               (drift, provenance, orchestration)

Inner (engineering) triad:
  instrumenting  Wheel 4 — per-cycle CPU / RSS / page-fault sampling
                 (psutil, opentelemetry, py-spy, memray)
  auditing       Wheel 5 — orchestrated static-analysis tools
                 (ruff, bandit, mypy, pip-audit)
  reporting      Wheel 6 — render results to Markdown / HTML / LaTeX
                 (matplotlib, jinja2)

The six plug-in pillars (O · F · A · M · I · N) live inside the measuring ring:

O  observability       SPC + SRM + drift detectors      (scipy, river)
F  formal provenance   PROV-O graph + lineage store     (prov, MLflow, DVC)
A  adaptive testing    SPRT + mSPRT anytime-valid       (statsmodels)
M  mixed-effects       MixedLM + MEA                    (statsmodels)
I  iterative synthesis cumulative meta-analysis         (statsmodels)
N  n-fold robustness   cross-validation                 (scikit-learn)

The framework is independent of any particular substrate-under-test; MockSubstrate makes the whole observatory runnable with no external system, and KimeraAdapter plugs in Kimera-SWM via a subprocess boundary.

version `module-attribute` ¶

__version__ = '0.55.0'

Protocols¶

ophamin.protocols ¶

The plug-in surfaces of the Ophamin observatory.

Ophamin is built to accept plug-in datasets, plug-in substrate probes, plug-in analytic pillars, and plug-in scenarios. This module declares the protocols each plug-in must satisfy. A new plug-in implements the protocol; nothing inside Ophamin's core has to change.

The protocols are intentionally narrow — each one names a single contract:

SubstrateProbe   the thing being observed (e.g. KimeraAdapter, MockSubstrate)
DatasetConnector a corpus the observatory feeds the substrate from
Pillar           a library-backed analytic that turns cycle results into
                 PillarEvidence (one statistical method per pillar)
ScenarioProtocol a corpus + target + pre-registered claim runner

A protocol is Protocol-typed and runtime_checkable so callers can verify plug-ins at registration time (isinstance(plugin, Pillar)) without inheritance.

Pillar ¶

Bases: Protocol

One analytic pillar — a statistical method that turns observations into PillarEvidence.

A Pillar declares its name + the library it delegates to (library + version go into every signed proof record). Different pillars implement different statistical methods: SPC, SPRT, mixed-effects, etc.

The current six pillars (O / F / A / M / I / N) live under ophamin.measuring.pillars/*; this protocol describes the contract a new pillar must satisfy to be registrable.

As of Move G (2026-05-16) eleven adapter classes in ophamin.measuring.pillars._adapters satisfy this Protocol and register themselves with :data:ophamin.registry.PILLARS at module import time. Out-of-tree pillars register the same way: construct an instance of a :class:PillarBase subclass and call :func:ophamin.registry.register_pillar. Per-pillar compute signatures diverge (sequential testing vs control charts vs cross-validation are different shapes); the Protocol is metadata- backed-by-compute — adapters whose pillar doesn't fit the uniform compute(cycle_results, records) shape raise :class:ophamin.measuring.pillars.base.NonUniformComputeError with a pointer to the canonical per-pillar API.

ScenarioProtocol ¶

Bases: Protocol

A scenario binds a corpus + target + pre-registered claim and produces a signed Empirical Proof Record.

The existing ophamin.measuring.scenarios.base.Scenario abstract class is the canonical implementation. As of 2026-05-16 the framework ships nineteen scenarios across five tiers (Scientific / Engineering / Philosophical / Empirical-deep / Measurement-machinery); see the README scenarios table for the full list.

The pre-registration discipline is preserved across plug-ins: every ScenarioProtocol implementation must produce a claim whose threshold is falsifiable (a value the substrate could fail to meet), before the run.

.. note::

Eleven of the nineteen scenarios are not currently registered in ophamin.measuring.scenarios.__init__.SCENARIOS, which means they are reachable from Python imports but not from the ophamin scenario <name> CLI surface. See docs/ARCHITECTURE_INTENT_VS_REALITY_2026_05_16.md for the gap.

DatasetConnector ¶

Bases: Protocol

A corpus the observatory can stream records from.

The existing ophamin.seeing.corpus.base.Corpus abstract class is the canonical example. Each corpus is content-addressable (its content hash appears in every signed proof record).

SubstrateProbe ¶

Bases: Protocol

A substrate-under-test the observatory can drive.

Any concrete substrate (Kimera, mock, future substrates) implements this contract. The existing ophamin.seeing.substrate.base.SubstrateUnderTest abstract class is the canonical example.

Registry¶

ophamin.registry ¶

Central plug-in registry.

Closes gap B from docs/ARCHITECTURE_INTENT_VS_REALITY_2026_05_16.md — until this module landed, the four Protocols declared in :mod:ophamin.protocols (SubstrateProbe / DatasetConnector / Pillar / ScenarioProtocol) had no registration surface. Plug-ins were hard-imported into individual scenarios.

The registry exposes one dict per plug-in kind and one register_* function per kind. Scenarios continue to register themselves via :meth:Scenario.__init_subclass__ (Move A); the SCENARIOS dict is re-exported here for one-stop discovery. Pillars are registered by their module's __init__.py-time call to :func:register_pillar. Corpora are looked up via :func:ophamin.seeing.corpus.get_corpus (existing surface).

Every registration is loud-failure:

A duplicate pillar_name raises :class:DuplicatePluginError rather than silently overwriting.
A plug-in that fails the matching isinstance(p, Protocol) check raises :class:PluginProtocolViolationError — the Protocol declared the contract; an adapter that doesn't satisfy it is a real defect.

Outside callers query the registry via:

>>> from ophamin.registry import PILLARS, list_pillars, get_pillar
>>> p = get_pillar("O.spc")
>>> p.library, p.library_version
('numpy', '1.26.0')

Or via the ophamin pillar list / show CLI surface.

register_pillar ¶

register_pillar(pillar: PillarBase) -> PillarBase

Register one pillar adapter in the central registry.

Returns the pillar (so calls can be expressed as MY_PILLAR = register_pillar(MyPillar()) at module scope).

Raises:

Type	Description
`PluginProtocolViolationError`	if the object doesn't satisfy the :class:`Pillar` runtime protocol (missing `pillar_name` / `library` / `library_version` / `compute`).
`DuplicatePluginError`	if another pillar already registered the same `pillar_name`. Re-registration of the same object under the same name is idempotent (necessary for module reloads).

Signed-record codecs¶

Empirical Proof Record¶

ophamin.measuring.proof.record ¶

The Ophamin Empirical Proof Record — the official result artifact.

One record per verified claim. Two serialisations:

proof.json   canonical, machine-readable, JSON-Schema-validated
PROOF.md     rendered, human-readable

A proof is bulletproof when it is:

falsifiable — every claim carries a pre-registered Threshold
pre-registered — claim + config + analysis plan hashed BEFORE the run
traceable — content-addressed: claim -> config -> substrate -> data -> result
reproducible — exact command + environment lock + lineage chain
attributed — every statistic names the library + version that produced it
tamper-evident — HMAC-signed over the whole record body

The nine sections:

1 Identity          2 Claim            3 Pre-registration
4 Data              5 Evidence         6 Verdict
7 Reproduction      8 Provenance       9 Signature

A REFUTED record is a valid proof — disproving a claim is a result.

EmpiricalProofRecord `dataclass` ¶

The official Ophamin result artifact — nine sections, content-addressed, signed.

proof_id `property` ¶

proof_id: str

Content-addressed identifier — SHA-256 over sections 1-8.

sign ¶

sign(key: bytes) -> 'EmpiricalProofRecord'

HMAC-SHA256 sign the record body. Returns self for chaining.

verify_signature ¶

verify_signature(key: bytes) -> bool

True iff the signature matches the current body under key.

validate ¶

validate() -> list[str]

Return a list of problems; an empty list means the record is well-formed.

Enforces the properties that make a proof bulletproof — every one of them, not a subset.

from_dict `classmethod` ¶

from_dict(data: dict[str, Any]) -> 'EmpiricalProofRecord'

Reconstruct an EmpiricalProofRecord from its to_dict payload.

Mirrors the on-disk JSON shape produced by to_json — the body sections (claim / preregistration / data / evidence / verdict / reproduction / provenance / signature) plus the identity sub-dict. Raises KeyError / ValueError loudly on a malformed payload — no silent fill-defaults; a broken record should fail loud, not deserialize as a partial.

from_json `classmethod` ¶

from_json(path: str) -> 'EmpiricalProofRecord'

Load a proof record from a JSON file written by to_json.

Claim `dataclass` ¶

Section 2 — the falsifiable claim, as a five-tuple.

Threshold `dataclass` ¶

A falsifiable pass/fail boundary — there is no claim without one.

decide ¶

decide(observed: float) -> bool

True iff the observed value satisfies the threshold.

Verdict `dataclass` ¶

Section 6 — VALIDATED / REFUTED / INCONCLUSIVE against the threshold.

decide `classmethod` ¶

decide(observed: float, threshold: Threshold, *, inconclusive: bool = False, reasoning: str = '') -> 'Verdict'

Decide the verdict by comparing observed against threshold.

PillarEvidence `dataclass` ¶

Section 5 — one pillar's measured evidence, attributed to its library.

The cross_check field is constrained to :data:_CROSS_CHECK_VALUES — passing prose into it fires a loud ValueError at construction time. Long-form context belongs in detail (free-form dict) instead.

PreRegistration `dataclass` ¶

Section 3 — claim + plan hashed BEFORE the run.

preregistered_at must precede the record's created_at; validate enforces it. Build this object before the experiment runs.

DatasetRef `dataclass` ¶

Section 4 — one real dataset, content-addressed.

Reproduction `dataclass` ¶

Section 7 — exact reproduction command, environment lock, lineage chain.

ophamin.measuring.proof.codec ¶

Format codec for :class:EmpiricalProofRecord — the single canonical load / validate / verify / ingest surface.

The proof-record component dataclasses already declare to_dict / from_dict round-trip pairs, and :class:EmpiricalProofRecord ships to_json / from_json file shortcuts. This module collects them into one loud-failure interface that bundles:

JSON-Schema validation against proof/schema.json;
the structural record.validate() checklist (falsifiable + pre-registered + traceable + reproducible + attributed);
optional HMAC-SHA256 signature verification under a caller-provided key;
one-call ingest that runs all three and raises loud on the first failure;
directory-walking iter_proofs / list_proofs so the proof corpus on disk has a first-class Python interface.

Per the framework's no-fallback rule: every failure mode raises a typed :class:ProofCodecError subclass with a descriptive message; the codec does not return None or empty dicts on error and does not swallow exceptions.

Read alongside :class:EmpiricalProofRecord itself (src/ophamin/measuring/proof/record.py) and the JSON Schema (src/ophamin/measuring/proof/schema.json).

SCHEMA_VERSION `module-attribute` ¶

SCHEMA_VERSION = '1.0'

dump ¶

dump(record: EmpiricalProofRecord, path: str | Path, *, indent: int = 2) -> Path

Write record to path as canonical JSON. Returns the path.

Creates the parent directory if it doesn't already exist (mirrors the convenience pattern of pathlib.Path.write_text callers typically wrap).

Raises :class:OSError if the write fails — codec does NOT swallow file-system errors. Use the higher-level CLI / orchestration layer if structured error handling is wanted.

load ¶

load(path: str | Path) -> EmpiricalProofRecord

Load + reconstruct an :class:EmpiricalProofRecord from path.

Raises :class:ProofDecodeError on file-system errors, malformed JSON, or a structurally incomplete payload. The chained exception preserves the underlying error for forensic debugging.

validate ¶

validate(path: str | Path, *, key: bytes | None = None) -> ValidationReport

Run schema + record + (optional) signature validation in one call.

Returns a :class:ValidationReport capturing every layer's result. Does NOT raise on any validation failure — caller inspects the report's all_ok property and schema_errors / record_problems tuples to decide what to do.

Use :func:ingest for the raise-on-any-failure variant.

The JSON-Schema check runs first; if the schema is broken, the structural record.validate is skipped (a malformed-at-schema payload can't be safely reconstructed into a record). When schema-ok, the record is loaded and record.validate is invoked; if a key was provided, signature verification runs too. The signature_ok field is None when no key was provided (i.e. the check was skipped), True / False when a key was provided.

verify_signature ¶

verify_signature(path: str | Path, key: bytes) -> bool

Load the record at path + verify its HMAC-SHA256 signature.

Returns True iff the signature matches the record body under key. False if signature is empty or doesn't match.

Raises :class:ProofDecodeError if the file itself can't be loaded. Does NOT raise on signature mismatch — caller decides whether to escalate (use :func:ingest with strict_signature=True for the loud-failure variant).

ingest ¶

ingest(path: str | Path, *, key: bytes | None = None, strict_signature: bool = False, require_schema_version: str | None = SCHEMA_VERSION) -> EmpiricalProofRecord

Single-call load + full-validate + optional signature-verify.

The boundary function for accepting third-party proof records. After a successful call, the returned :class:EmpiricalProofRecord is guaranteed:

structurally well-formed (JSON-Schema validated),
record-validate-clean (no internal contradictions),
schema-version matches require_schema_version (unless that's None, which opts out of the version gate),
signature-verified IFF strict_signature=True was passed AND a key was provided.

On any failure, raises the matching :class:ProofCodecError subclass with a descriptive message. The caller never has to inspect a partial / fallback record.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	file path to the proof JSON.	required
`key`	`bytes \| None`	optional HMAC-SHA256 key for signature verification.	`None`
`strict_signature`	`bool`	when True, require `key` to be provided AND the signature to verify. Default False — signature is checked when key is provided but verification failure is not fatal (matches the `validate` shape).	`False`
`require_schema_version`	`str \| None`	required schema version. Default is the current :data:`SCHEMA_VERSION`; pass `None` to accept any version (e.g. for migration tooling).	`SCHEMA_VERSION`

Raises:

Type	Description
`ProofSchemaError`	JSON-Schema validation failed.
`ProofValidationError`	structural `record.validate` failed.
`ProofSchemaVersionMismatchError`	`schema_version` didn't match `require_schema_version`.
`ProofSignatureError`	`strict_signature=True` and either no key was provided or signature verification failed.
`ProofDecodeError`	file couldn't be read or JSON couldn't be decoded (raised by underlying :func:`load`).

list_proofs ¶

list_proofs(directory: str | Path, *, key: bytes | None = None) -> tuple[ProofListEntry, ...]

Walk directory recursively and return one summary entry per JSON.

A file that fails to decode produces an entry with error set and the other content fields None — the walk does NOT stop on a bad file. Use :func:ingest against an individual path when you need loud-failure semantics for a single record.

Audit Record¶

ophamin.auditing.audit_record ¶

AuditRecord — the signed, content-addressable artefact of one audit run.

Parallel to EmpiricalProofRecord but descriptive by default — audits don't require a falsifiable claim because the value is in the findings distribution itself, not in passing a threshold. A separate threshold-mode wrapper can pre-register pass/fail criteria for CI gating; that's a follow-on.

Nine logical sections, mirroring the proof record shape so the two can be processed by the same downstream tooling (reporting, drift, etc.):

Identity ophamin version + commit, captured_at, schema version
Target path being audited + its content hash (for forensics)
Pillars which pillars ran, which were unavailable, versions
Findings the union of every pillar's findings (already in PillarResult, but flattened here for cross-pillar hotspot detection)
Summary aggregate counts + severity histogram + file hotspots
(no verdict) audits are descriptive; if a claim is wanted, wrap this record in an Empirical Proof Record with a threshold on a chosen statistic
Reproduction command + env-lock (later)
Provenance (optional) PROV-O graph of the run
Signature HMAC-SHA256 over the body

AuditRecord `dataclass` ¶

One audit run's full artefact — signed, content-addressable.

As of schema audit/1.1 (Move L, 2026-05-16), an AuditRecord MAY carry an optional :class:PreRegistration + chosen statistic metric + :class:Verdict, turning the descriptive record into a falsifiable artefact for CI gating. Records written under schema audit/1.0 (no pre-registration fields) load cleanly under the v1.1 codec — the optional fields default to None.

audit_id `property` ¶

audit_id: str

Content-addressed identifier — SHA-256 over the body.

from_dict `classmethod` ¶

from_dict(data: dict[str, Any]) -> 'AuditRecord'

Reconstruct an AuditRecord from its to_dict payload.

Accepts both schema audit/1.0 and audit/1.1 payloads. v1.0 records have no preregistration / verdict fields; v1.1 records may have one or both. Loud-fails on malformed shapes rather than silent partial deserialisation.

attach_pre_registration ¶

attach_pre_registration(*, claim: Any, observed_value: float, metric: str = 'total_findings', analysis_plan: str = 'audit-side pre-registration: gate on a chosen audit statistic') -> 'AuditRecord'

Stamp an in-place pre-registration + verdict onto this record.

Per Move L's full universalization of the pre-registration discipline: convert this descriptive audit into a falsifiable artefact by attaching a Claim's threshold + a decided Verdict. Returns self for chaining; bumps the record's schema_version to audit/1.1 if it wasn't already there.

Sign() must be re-called after attach to refresh the signature (the body changed, so the old signature is invalid).

from_json `classmethod` ¶

from_json(path: str | Path) -> 'AuditRecord'

Load an AuditRecord from a JSON file written by :meth:to_json.

wrap_as_proof ¶

wrap_as_proof(*, claim: Any, observed_value: float, pillar_name: str = 'audit', statistic_name: str = 'total_findings', library: str = 'ophamin', library_version: str = '', analysis_plan: str = 'wrap-audit-as-proof: pre-register a threshold over an audit statistic for CI gating', sign_key: bytes | None = None) -> Any

Wrap this AuditRecord into a pre-registered EmpiricalProofRecord.

Per Move I: audits are descriptive by default, but a caller that wants CI gating (e.g. total_findings <= 50) can wrap the record in a proof record that carries the falsifiable claim + threshold. The wrapping is lossless — the audit's forensic detail (target path + content hash + per-pillar findings + summary) is kept in the proof's reproduction + evidence sections.

Parameters:

Name	Type	Description	Default
`claim`	`Any`	an :class:`ophamin.measuring.proof.Claim` whose threshold is the gate (e.g. `Threshold("total_findings", "<=", 50)`).	required
`observed_value`	`float`	the statistic to evaluate the claim against (typically `record.summary.total_findings` or a per-severity count).	required
`pillar_name`	`str`	PillarEvidence pillar identifier in the wrapped proof. Default `"audit"`.	`'audit'`
`statistic_name`	`str`	PillarEvidence statistic name. Default `"total_findings"`.	`'total_findings'`
`library`	`str`	PillarEvidence library attribution. Default `"ophamin"` since the audit aggregation IS Ophamin's code.	`'ophamin'`
`library_version`	`str`	PillarEvidence library version. Default empty — caller fills if known.	`''`
`analysis_plan`	`str`	PreRegistration analysis plan. Default explains the wrap-shape.	`'wrap-audit-as-proof: pre-register a threshold over an audit statistic for CI gating'`
`sign_key`	`bytes \| None`	HMAC-SHA256 sign key. If `None`, the proof is returned unsigned (caller's responsibility to sign before persisting).	`None`

Returns:

Type	Description
`Any`	A signed (or unsigned, if `sign_key=None`)
`Any`	class:`EmpiricalProofRecord`.

to_markdown ¶

to_markdown(path: str | None = None) -> str

Render the audit as a human-readable Markdown report.

AuditSummary `dataclass` ¶

Cross-pillar aggregate — every pillar's findings rolled up.

from_dict `classmethod` ¶

from_dict(data: dict[str, Any]) -> 'AuditSummary'

Reconstruct an AuditSummary from its to_dict payload.

top_files round-trips as a list of [path, count] pairs (JSON doesn't carry tuples natively); we coerce back into the (str, int) tuple shape this dataclass declares.

ophamin.auditing.codec ¶

Format codec for :class:AuditRecord — Move H parallel to Move B's proof codec.

Mirrors :mod:ophamin.measuring.proof.codec exactly so audit records get the same load / validate / verify / ingest treatment proof records do. The differences:

No JSON-Schema validation today (audit records don't ship a schema.json alongside; structural validation is via the AuditRecord.from_dict parser).
No threshold/verdict — audits are descriptive by default. The validation layer focuses on signature + body-roundtrip integrity + schema_version compatibility.

Per the framework's no-fallback rule, every failure raises a typed :class:AuditCodecError subclass with a descriptive message.

SCHEMA_VERSION `module-attribute` ¶

SCHEMA_VERSION = 'audit/1.1'

dump ¶

dump(record: AuditRecord, path: str | Path, *, indent: int = 2) -> Path

Write record to path as canonical JSON. Returns the path.

Creates the parent directory if it doesn't already exist. Raises :class:OSError if the write fails — codec does NOT swallow file-system errors.

load ¶

load(path: str | Path) -> AuditRecord

Load + reconstruct an :class:AuditRecord from path.

Raises :class:AuditDecodeError on file-system errors, malformed JSON, or a structurally incomplete payload. The chained exception preserves the underlying error for forensic debugging.

validate ¶

validate(path: str | Path, *, key: bytes | None = None) -> AuditValidationReport

Run structural + (optional) signature validation in one call.

Returns a :class:AuditValidationReport. Does NOT raise on any validation failure — caller inspects the report's all_ok property + record_problems tuple to decide what to do. Use :func:ingest for the raise-on-any-failure variant.

The structural check (via :meth:AuditRecord.from_dict + the in-module _structural_problems helper) runs first; if the record can't be loaded, the report carries the decode error as a single record-problem string. When loaded, the in-module shape check augments with cross-section consistency (e.g. pillars in record vs summary).

verify_signature ¶

verify_signature(path: str | Path, key: bytes) -> bool

Load the record at path + verify its HMAC-SHA256 signature.

Returns True iff the signature matches the record body under key. False if signature is empty or doesn't match.

Raises :class:AuditDecodeError if the file itself can't be loaded.

list_audits ¶

list_audits(directory: str | Path, *, key: bytes | None = None) -> tuple[AuditListEntry, ...]

Walk directory recursively; emit one summary entry per JSON.

A file that fails to decode produces an entry with error set and the other content fields None. Mirrors :func:ophamin.measuring.proof.codec.list_proofs.

Campaign Record¶

ophamin.campaign ¶

CampaignRecord + the 6-phase composite-run orchestrator (Move F).

Closes Deficit 2 from docs/ARCHITECTURE_EXTENDED_AUDIT_2026_05_16.md — the "6 phases" the owner named are the six wheels of Ophamin's architecture operating as a single coordinated pass against a substrate:

seeing — discover the substrate's surface
measuring — run the requested scenarios; collect signed proof records
comparing — synthesize the measuring output into a campaign summary; detect verdict flips
instrumenting — collect per-cycle resource cost (when the substrate was wrapped in InstrumentedSubstrate)
auditing — static-analysis sweep over the substrate's source (when a source-code path is available)
reporting — collate every preceding phase's output into one rolled-up Markdown report

Each phase produces a :class:CampaignPhase aggregate; the :class:CampaignRecord collects them into a signed, content-addressed aggregate. A phase can be ok / skipped / failed; skipped is the sanctioned outcome when the substrate doesn't expose what a phase needs (e.g. auditing against a MockSubstrate is skipped because there's no source code to audit).

The orchestrator never silently swallows phase failures — a failed phase carries its error message into the record so the operator sees exactly what broke. This is the framework's "loud-failure" stance applied at the campaign level.

CAMPAIGN_SCHEMA_VERSION `module-attribute` ¶

CAMPAIGN_SCHEMA_VERSION = '2.0'

CANONICAL_PHASE_ORDER `module-attribute` ¶

CANONICAL_PHASE_ORDER: tuple[str, ...] = ('seeing', 'measuring', 'comparing', 'instrumenting', 'auditing', 'reporting')

CampaignPhase `dataclass` ¶

One wheel's contribution to a composite run.

Three possible terminal status values:

"ok" — phase completed; artifact_paths + summary carry the output.
"skipped" — phase didn't apply (e.g. auditing against a Mock substrate); error carries the reason.
"failed" — phase raised; error carries the exception string. The campaign continues to the next phase (loud-failure at the campaign level, not at the per-phase level — operator sees every phase's outcome).

CampaignRecord `dataclass` ¶

Signed, content-addressed aggregate of one full-pass run.

Schema 2.0 (current) adds two strictly-additive fields:

corrected_verdicts — {claim_id → corrected_verdict} after multiplicity correction (FWER or FDR). Empty dict when no correction was applied or when no records carried a p_value.
multiplicity_correction_method — "holm" / "bh" / "none". The method the writer used when populating corrected_verdicts.

Schema 1.0 records remain readable: missing additive fields default to empty dict / "none" respectively. Signature verification is version-aware — :meth:_body includes the additive fields only when schema_version != "1.0", so a 1.0 signature still re-canonicalises bit-equal to the original wire form.

campaign_id `property` ¶

campaign_id: str

Content-addressed identifier — SHA-256 over the body.

run_campaign ¶

run_campaign(*, substrate: SubstrateUnderTest, target_name: str | None = None, target_git_commit: str | None = None, scenarios: list[type[Scenario]] | None = None, enable_phases: set[str] | None = None, out_dir: str | Path = 'campaigns/latest', sign_key: bytes = DEFAULT_SIGN_KEY, fwer_method: str = 'holm', fwer_alpha: float = 0.05) -> CampaignRecord

Run the six wheels in canonical order; emit a signed CampaignRecord.

Parameters:

Name	Type	Description	Default
`substrate`	`SubstrateUnderTest`	the substrate the measuring phase will run scenarios against. Required.	required
`target_name`	`str \| None`	human-facing name for the target (default: the substrate's `name` attribute).	`None`
`target_git_commit`	`str \| None`	the target's git commit hash (default: the substrate's `git_commit()` return value).	`None`
`scenarios`	`list[type[Scenario]] \| None`	list of Scenario classes to run in the measuring phase. Default: every default-instantiable scenario in :data:`SCENARIOS`.	`None`
`enable_phases`	`set[str] \| None`	set of phase names to run. Default: all six.	`None`
`out_dir`	`str \| Path`	directory under which per-phase artifacts are written.	`'campaigns/latest'`
`sign_key`	`bytes`	HMAC-SHA256 key for signing the final record.	`DEFAULT_SIGN_KEY`
`fwer_method`	`str`	multiplicity-correction method to apply during the comparing phase. One of :data:`ophamin.comparing.fwer.SUPPORTED_METHODS` (`"holm"` / `"bh"` / `"none"`). The default is `"holm"` — strict FWER control via Holm-Bonferroni. New in schema 2.0 (RFC 0002 Phase E2).	`'holm'`
`fwer_alpha`	`float`	family-wise / FDR threshold used when applying the correction. Default 0.05.	`0.05`

Returns:

Type	Description
`CampaignRecord`	A signed :class:`CampaignRecord` with one
`CampaignRecord`	class:`CampaignPhase` per executed phase, plus, when the
`CampaignRecord`	`comparing` phase ran, the schema-2.0
`CampaignRecord`	`corrected_verdicts` mapping + `multiplicity_correction_method`
`CampaignRecord`	populated from the FWER pass.

The orchestrator NEVER raises on a per-phase failure — it captures the error string into the phase's error field and continues to the next phase. The caller inspects record.any_failed to surface to a non-zero exit code if appropriate.

dump_campaign ¶

dump_campaign(record: CampaignRecord, path: str | Path) -> Path

Write a CampaignRecord to disk as canonical JSON. Returns the path.

load_campaign ¶

load_campaign(path: str | Path) -> CampaignRecord

Load a CampaignRecord from disk.

Regression Alert Record¶

ophamin.comparing.regression_alert ¶

Regression-alert daemon (Move J) — closes gap F from the prior audit.

Detects verdict regressions across two snapshots of a proof corpus (typically: proofs/ at the prior Kimera commit vs proofs/ at the new commit). A "regression" is a scenario whose verdict moved from VALIDATED (or INCONCLUSIVE) to REFUTED — the substrate started failing a claim it previously satisfied. The detector also flags the inverse ("recovery": REFUTED → VALIDATED) and the lateral cases (different verdict that's not the substrate's load-bearing regression direction).

The pipeline:

Snapshot a proof corpus at commit A (e.g. via :func:scan_proof_directory).
Snapshot the proof corpus at commit B.
Run :func:compute_regression_alert on the pair.
Inspect the resulting :class:RegressionAlert — list of :class:VerdictTransition rows + headline counts.

The pairing key is the scenario's stable identifier (the proof's filename family or — when present — the underlying scenario name via the proof's claim-statement signature). For two proofs of the same family at two different substrate commits to be paired, both must carry the same family heuristic; mis-paired entries are surfaced as unmatched_in_a / unmatched_in_b for operator inspection.

CLI:

ophamin watch-proofs --before --after [--out ]

Output is a signed :class:RegressionAlertRecord (HMAC-SHA256 + content-addressed alert_id), mirroring the shape of every other Ophamin artifact. A REGRESSION-class alert is exit-code 1; a quiet (no-change) alert is exit-code 0; a recovery-class alert is exit-code 0 with a notable summary line.

REGRESSION_ALERT_SCHEMA_VERSION `module-attribute` ¶

REGRESSION_ALERT_SCHEMA_VERSION = 'regression-alert/1.0'

RegressionAlertRecord `dataclass` ¶

Signed, content-addressed artifact of one before/after comparison.

Substrate base¶

ophamin.seeing.substrate.base ¶

The substrate-under-test abstraction.

Ophamin is independent of any particular system. Whatever it tests is a SubstrateUnderTest (SUT): something that can be reset, run for one cycle on a stimulus, and asked for its git commit and state. MockSubstrate implements this with no dependencies (so the framework is fully runnable on its own); KimeraAdapter implements it over a subprocess boundary to Kimera-SWM.

The cycle boundary is deliberate. Per the leak-free probe shape established empirically (a fresh interpreter per cycle removes process-level state carry), run_cycle is the unit of measurement and reset is honoured between runs.

SubstrateUnderTest ¶

Bases: ABC

Abstract system under test. Implement this to plug a system into Ophamin.

git_commit `abstractmethod` ¶

git_commit() -> str

Return the substrate's source revision.

This is the data_git_commit_id end of the provenance bridge: every recorded run is tethered to the exact substrate revision that produced it. Return "" only if the substrate genuinely has no version anchor.

reset `abstractmethod` ¶

reset() -> None

Return the substrate to a clean initial state between runs.

run_cycle `abstractmethod` ¶

run_cycle(stimulus: Any, params: dict[str, Any] | None = None) -> CycleResult

Exercise the substrate for exactly one cycle on stimulus.

params carries the swept configuration for this run. Implementations must not silently degrade: if the cycle cannot run, return a CycleResult with success=False and a populated error, or raise — never fabricate a plausible-looking result.

run_batch ¶

run_batch(stimuli: list[Any], params: dict[str, Any] | None = None) -> list[CycleResult]

Exercise the substrate over a batch of stimuli.

The default implementation simply loops run_cycle — correct, but one boundary crossing per cycle. Adapters that can run a whole batch inside a single process (the leak-tolerant density path) should override this; cycle_index is renumbered sequentially across the batch.

capture_state ¶

capture_state() -> dict[str, Any]

Return a serialisable snapshot of substrate state (for provenance).

Default is empty; substrates with inspectable state should override.

metadata ¶

metadata() -> dict[str, Any]

Static descriptive metadata about this substrate.

CycleResult `dataclass` ¶

The outcome of one substrate cycle.

raw is whatever the substrate emitted, untouched. success and halt_mode are the two cross-substrate fields every adapter must fill. A pre-built metric_bundle may be attached by the adapter; otherwise to_metric_bundle does best-effort extraction from raw.

to_metric_bundle ¶

to_metric_bundle() -> MetricBundle

Return the attached bundle, or build one with best-effort extraction.

The default extraction recognises a small set of conventional field names. Adapters that know their substrate should attach an explicit metric_bundle rather than relying on this.

ophamin.seeing.substrate.mock ¶

MockSubstrate — a self-contained substrate under test.

This is what makes Ophamin runnable and testable with no external system. It is a deterministic, seedable stand-in that produces plausible cycle results: a phi-like cognitive signal that drifts as state accumulates, an energy gauge that depletes, latency timers, and a tunable collapse mode so the diagnostics have something to find.

It is not a model of any real substrate — it exists so the framework's pillars and orchestration can be exercised and verified end-to-end. Real systems plug in through their own SubstrateUnderTest adapter (see kimera_adapter).

MockSubstrate ¶

Bases: SubstrateUnderTest

A deterministic, seedable substrate stand-in.

Behaviour responds to swept parameters so the framework can be exercised:

injection_rate       raises ``phi``, lowers cross-modal overlap & energy
immune_threshold     low + high injection -> occasional overwhelm collapse
variant              "treatment"-like labels add a small positive effect
entropy_coefficient  in a ``collapse_cell``, low entropy triggers collapse
cell                 topological cell id (for the kernel-coupling probe)

State (cycle count, energy, accumulated phi) carries across cycles and is cleared by reset — the seed makes the whole sequence reproducible.

ophamin.seeing.substrate.kimera_adapter ¶

KimeraAdapter — plug Kimera-SWM into Ophamin as a multi-component substrate.

This is the central Kimera-coupling point in the framework — it adapts the substrate-under-test surface so the rest of Ophamin (measuring/ / comparing/ / auditing/ / reporting/) operates against the abstract SubstrateUnderTest protocol. A small number of seeing-wheel-internal helpers (seeing/discovery, seeing/wiring, seeing/telemetry) also reach into Kimera shapes — those are the same conceptual layer as KimeraAdapter itself. Models Kimera-SWM as what it is: a multi-component entity, not a single cognitive cycle.

An experiment targets either the whole entity (target="entity" — the integrated Takwin cycle) or a named component ("walker", "gwf", "rosetta", "arachne", "ouroboros", "pentecost", "piovra", "astrolabe" …) — each invoked through its own verified entry point.

Two modes:

mode="subprocess" — a fresh interpreter per cycle. Leak-free, slow; the precision path.
mode="batch" — one interpreter, the component constructed once, the whole batch looped in-process. Fast; the density path. State accumulates across the batch, which for most components is the substrate working as designed (memory-as-deformation), not a leak.

Performance is measured, never assumed: measure_throughput runs a bounded batch and reports real cycles/sec on this vessel. probe verifies which targets are actually reachable in the connected repo.

KimeraAdapter ¶

Bases: SubstrateUnderTest

Subprocess adapter for the Kimera-SWM substrate — entity or any component.

write_runner_template `staticmethod` ¶

write_runner_template(path: str | Path) -> Path

Dump the bundled runner so it can be edited and reused.

probe ¶

probe() -> dict[str, Any]

Verify which targets are reachable in the connected repo.

Returns a structured report — run this before wiring scenarios. It is how the adapter checks the substrate, not the docs.

measure_throughput ¶

measure_throughput(stimuli: list[Any], params: dict[str, Any] | None = None) -> dict[str, Any]

Measure real cycles/sec for this target — performance is measured, not assumed.

Runs the stimuli as one in-process batch and times it. The result is the empirical basis for choosing subprocess-vs-batch and for a throughput proof record — there is no assumed performance figure anywhere.

Corpus base¶

ophamin.seeing.corpus.base ¶

Massive-dataset corpus layer — base abstraction.

A Corpus locates a downloaded open-source dataset on disk, content-addresses it, counts its records, and streams them as CorpusRecord objects — so a catastrophic-testing scenario can feed real data through the substrate in concentrated batches.

The four connectors (connectors.py):

EnronCorpus         ~500k real executive emails        — organisational dissonance
LinuxKernelCorpus   ~1.4M commit messages              — logic / topology siege
CyberPayloadCorpus  Metasploit modules + injection sets — concentrated immune siege
FloresCorpus        FLORES-200, 200 parallel languages — Rosetta scaling limit

Content hashes and record counts are computed once and cached to disk (.ophamin_<name>_content_hash / _count) so a 1.7 GB archive is not re-hashed on every run.

Corpus ¶

Bases: ABC

A downloaded open-source dataset, content-addressed and streamable.

is_available `abstractmethod` ¶

is_available() -> bool

True iff the raw data is present on disk.

records `abstractmethod` ¶

records() -> Iterator[CorpusRecord]

Stream every record. Must be a generator — corpora do not fit in memory.

content_hash ¶

content_hash() -> str

Content-addressed hash of the corpus, cached in-memory and on disk.

count ¶

count() -> int

Total record count, cached in-memory and on disk.

sample ¶

sample(n: int, seed: int = 0) -> list[CorpusRecord]

A deterministic reservoir sample of n records (single streaming pass).

chunks ¶

chunks(size: int, limit: int | None = None) -> Iterator[list[CorpusRecord]]

Yield records in batches of size — concentrated-batch density feeding.

limit caps the total number of records emitted across all batches.

dataset_ref ¶

dataset_ref() -> 'DatasetRef'

Produce the proof-record DatasetRef for this corpus.

CorpusRecord `dataclass` ¶

One item from a corpus — an email, a commit message, a payload, a sentence.

Scenario base¶

ophamin.measuring.scenarios.base ¶

The catastrophic-scenario layer.

A Scenario binds a real corpus + a substrate target + a pre-registered falsifiable claim. It streams the corpus through the substrate, scores the run, and emits a signed EmpiricalProofRecord.

The harness is substrate-agnostic — it runs identically against MockSubstrate (tests) or KimeraAdapter (real catastrophic runs). Pre-registration is captured before the run; the proof record is content-addressed and signed.

Scenario registration¶

Every concrete subclass of :class:Scenario that sets a name attribute distinct from the base sentinel "scenario" is automatically registered in the module-level :data:SCENARIOS mapping via the :meth:Scenario.__init_subclass__ hook. There is no manual editing of an __init__.py dict required; the registry is built by class-definition side effect.

Registration is loud-failure:

A duplicate name across two subclasses raises :class:DuplicateScenarioNameError at class-definition time.
A subclass that sets name = "scenario" (the unchanged base default) raises :class:ScenarioNameNotOverriddenError.
A subclass that opts out via register=False (e.g. an abstract intermediate parent in a class hierarchy) is skipped silently. This is the only sanctioned skip path.

Third-party / out-of-tree scenarios reach the same registry by simply inheriting from :class:Scenario in their own package; importing their module fires the registration hook.

DEFAULT_SIGN_KEY `module-attribute` ¶

DEFAULT_SIGN_KEY = b'ophamin-scenario-proof-key'

Scenario ¶

Bases: ABC

Binds corpus + target + pre-registered claim -> a signed proof record.

Every concrete subclass declares a metadata block (name, tier, family, goal, explanation, and optionally method + falsification_consequence) that classifies the experiment and explains its intent without requiring the reader to chase docstrings. The metadata is validated at class-definition time by :meth:__init_subclass__ (loud-failure on omission) and surfaces into every signed EmpiricalProofRecord produced by the scenario.

__init_subclass__ ¶

__init_subclass__(register: bool = True, **kwargs: object) -> None

Auto-register concrete subclasses in :data:SCENARIOS.

Skips registration when register=False (abstract intermediate parents, test-internal scenarios). Otherwise:

raises :class:ScenarioNameNotOverriddenError if the subclass kept the base sentinel name;
raises :class:ScenarioMetadataMissingError if any of tier / family / goal / explanation is unset or empty;
raises :class:DuplicateScenarioNameError if another subclass already registered the same name.

Re-registration of the same class object under the same name is idempotent — this is necessary so module reloads (e.g. test fixtures, importlib.reload) don't trip the duplicate guard.

build_claim `abstractmethod` ¶

build_claim() -> Claim

The pre-registered falsifiable claim this scenario tests.

score `abstractmethod` ¶

score(cycle_results: list[CycleResult], records: list[CorpusRecord]) -> ScenarioScore

Read the completed run into an observed value + pillar evidence.

field_contract ¶

field_contract() -> ScenarioFieldContract | None

The OrchestratorResult fields this scenario depends on.

Default None means no contract — scenarios that don't override this run exactly as before (back-compat). Scenarios that DO override get loud-failure on the first cycle if a required field is missing or has the wrong type. This catches Kimera-side renames at experiment setup time instead of silently breaking downstream.

Returning a contract is purely additive — the scenario still reads cycle.raw["..."] ad-hoc in :meth:score. The contract is the gate, not the projection.

select_records ¶

select_records(corpus: Corpus) -> Iterator[CorpusRecord]

Which corpus records to use — default is the corpus stream; override to filter.

run ¶

run(substrate: SubstrateUnderTest, *, data_root: str | 'Path' | None = None, sign_key: bytes = DEFAULT_SIGN_KEY) -> EmpiricalProofRecord

Run the scenario end-to-end and return a signed Empirical Proof Record.

ScenarioScore `dataclass` ¶

A scenario's read of a completed run — the observed value + the evidence.

Tier ¶

Bases: str, Enum

The experimentation tier a scenario lives in.

Tiers carry epistemic shape, not just bookkeeping:

SCIENTIFIC — claims about substrate behaviour (does the substrate do X under condition Y?).
ENGINEERING — claims about substrate cost (does X stay under threshold T?).
PHILOSOPHICAL — claims about substrate self-model (does the substrate respond differently to self-referential vs neutral input?).
EMPIRICAL_DEEP — substrate-physics characterisation scenarios that target Kimera's prime apparatus / Φ / cross-channel behaviour and mirror Family A-V claims in Kimera's EMPIRICAL_VALIDATION.md.
MEASUREMENT_MACHINERY — validation of the upstream libraries Ophamin itself depends on (e.g. CRDT laws against pycrdt + y-py as cross-check oracle).

Inheriting from str makes a Tier serialise as its value string in JSON; the JSON proof-record schema sees a plain string, not a Python-specific enum encoding.

Audit pillar base¶

ophamin.auditing.base ¶

The audit-pillar contract — Finding, FindingSeverity, PillarResult, AuditPillar.

A pillar wraps one external static-analysis tool. The contract is small:

name and tool_name identify the pillar (e.g. "ruff", "bandit")
is_available() reports whether the underlying binary is installed
run(target_path) returns a PillarResult carrying the findings + raw output

Findings are normalised across tools — every tool's output is parsed into the same Finding dataclass — so downstream code (aggregation, reporting, threshold-mode claims) doesn't need to know which pillar produced what.

AuditPillar ¶

Bases: ABC

Wraps one external static-analysis tool as an audit pillar.

A subclass implements tool_binary (the CLI name to look up on PATH), tool_version (a way to ask the tool its version), and run (the actual invocation + parse). Tool absence is reported as status="unavailable" — never silently skipped.

resolved_binary `classmethod` ¶

resolved_binary() -> str | None

Resolve the tool binary — venv-local first, then PATH.

is_available `classmethod` ¶

is_available() -> bool

Is the wrapped tool resolvable (venv-local OR on PATH)?

tool_version ¶

tool_version(timeout_s: float = 10.0) -> str

Best-effort <tool> --version capture; empty string on failure.

unavailable_result ¶

unavailable_result(target_path: str) -> PillarResult

Standard unavailable result for when the tool isn't installed.

run `abstractmethod` ¶

run(target_path: str | Path, **kwargs: Any) -> PillarResult

Run the tool against target_path and return a PillarResult.

Pillars MUST handle missing-tool cleanly via unavailable_result and runtime failures via error_result. Never silently swallow a failure.

Finding `dataclass` ¶

One static-analysis finding, normalised across tools.

Every field except path and message may be empty if the producing tool doesn't carry it — but the dataclass shape is stable so downstream code can rely on it.

FindingSeverity ¶

Bases: str, Enum

Normalised severity across heterogeneous tools.

Each pillar maps its tool's native severity scale onto these five buckets; the mapping is documented per-pillar.

PillarResult `dataclass` ¶

One pillar's full output.

severity_histogram ¶

severity_histogram() -> dict[str, int]

{severity_value: count} — bucket counts across all findings.

per_file_count ¶

per_file_count(top_n: int = 10) -> list[tuple[str, int]]

Top-N files by finding count.

Public API reference¶

Top-level¶

ophamin ¶

__version__ module-attribute ¶

Protocols¶

ophamin.protocols ¶

Pillar ¶

ScenarioProtocol ¶

DatasetConnector ¶

SubstrateProbe ¶

Registry¶

ophamin.registry ¶

register_pillar ¶

Signed-record codecs¶

Empirical Proof Record¶

ophamin.measuring.proof.record ¶

EmpiricalProofRecord dataclass ¶

proof_id property ¶

sign ¶

verify_signature ¶

validate ¶

from_dict classmethod ¶

from_json classmethod ¶

Claim dataclass ¶

Threshold dataclass ¶

decide ¶

Verdict dataclass ¶

decide classmethod ¶

PillarEvidence dataclass ¶

PreRegistration dataclass ¶

DatasetRef dataclass ¶

Reproduction dataclass ¶

ophamin.measuring.proof.codec ¶

SCHEMA_VERSION module-attribute ¶

dump ¶

load ¶

validate ¶

verify_signature ¶

ingest ¶

list_proofs ¶

Audit Record¶

ophamin.auditing.audit_record ¶

AuditRecord dataclass ¶

audit_id property ¶

from_dict classmethod ¶

attach_pre_registration ¶

from_json classmethod ¶

wrap_as_proof ¶

to_markdown ¶

AuditSummary dataclass ¶

from_dict classmethod ¶

ophamin.auditing.codec ¶

SCHEMA_VERSION module-attribute ¶

dump ¶

load ¶

validate ¶

verify_signature ¶

list_audits ¶

Campaign Record¶

ophamin.campaign ¶

CAMPAIGN_SCHEMA_VERSION module-attribute ¶

CANONICAL_PHASE_ORDER module-attribute ¶

CampaignPhase dataclass ¶

CampaignRecord dataclass ¶

campaign_id property ¶

run_campaign ¶

dump_campaign ¶

load_campaign ¶

Regression Alert Record¶

ophamin.comparing.regression_alert ¶

REGRESSION_ALERT_SCHEMA_VERSION module-attribute ¶

RegressionAlertRecord dataclass ¶

Substrate base¶

ophamin.seeing.substrate.base ¶

SubstrateUnderTest ¶

git_commit abstractmethod ¶

reset abstractmethod ¶

run_cycle abstractmethod ¶

run_batch ¶

capture_state ¶

version `module-attribute` ¶

EmpiricalProofRecord `dataclass` ¶

proof_id `property` ¶

from_dict `classmethod` ¶

from_json `classmethod` ¶

Claim `dataclass` ¶

Threshold `dataclass` ¶

Verdict `dataclass` ¶

decide `classmethod` ¶

PillarEvidence `dataclass` ¶

PreRegistration `dataclass` ¶

DatasetRef `dataclass` ¶

Reproduction `dataclass` ¶

SCHEMA_VERSION `module-attribute` ¶

AuditRecord `dataclass` ¶

audit_id `property` ¶

from_dict `classmethod` ¶

from_json `classmethod` ¶

AuditSummary `dataclass` ¶

from_dict `classmethod` ¶

SCHEMA_VERSION `module-attribute` ¶

CAMPAIGN_SCHEMA_VERSION `module-attribute` ¶

CANONICAL_PHASE_ORDER `module-attribute` ¶

CampaignPhase `dataclass` ¶

CampaignRecord `dataclass` ¶

campaign_id `property` ¶

REGRESSION_ALERT_SCHEMA_VERSION `module-attribute` ¶

RegressionAlertRecord `dataclass` ¶

git_commit `abstractmethod` ¶

reset `abstractmethod` ¶

run_cycle `abstractmethod` ¶

CycleResult `dataclass` ¶

write_runner_template `staticmethod` ¶

is_available `abstractmethod` ¶

records `abstractmethod` ¶

CorpusRecord `dataclass` ¶

DEFAULT_SIGN_KEY `module-attribute` ¶

build_claim `abstractmethod` ¶

score `abstractmethod` ¶

ScenarioScore `dataclass` ¶

resolved_binary `classmethod` ¶

is_available `classmethod` ¶

run `abstractmethod` ¶

Finding `dataclass` ¶

PillarResult `dataclass` ¶