Skip to content

Public API reference

Auto-generated from the source docstrings via mkdocstrings. Every public symbol listed below is part of Ophamin's stable surface; breaking changes follow the semver promise.

Top-level

ophamin

Ophamin — an empirical observatory around Kimera-SWM.

The name is the angelic order Ophanim (wheels-within-wheels covered with eyes — Ezekiel 1:18). Architecturally, Ophamin is a dyson sphere around Kimera: Kimera sits at the centre emitting; Ophamin envelops, senses, and returns measurement to the operator. Not a tool next to Kimera — a structure around it.

The structure has six wheels, in two concentric triads:

Outer (empirical) triad:
  seeing       Wheel 1 — how the observatory senses Kimera and the world
               (substrate, corpus, discovery)
  measuring    Wheel 2 — pre-registered measurement engines + plug-in pillars
               (proof, scenarios, metrics, pillars.{observability, adaptive,
               effects, synthesis, robustness, diagnostics})
  comparing    Wheel 3 — cross-Kimera-commit retrospection
               (drift, provenance, orchestration)

Inner (engineering) triad:
  instrumenting  Wheel 4 — per-cycle CPU / RSS / page-fault sampling
                 (psutil, opentelemetry, py-spy, memray)
  auditing       Wheel 5 — orchestrated static-analysis tools
                 (ruff, bandit, mypy, pip-audit)
  reporting      Wheel 6 — render results to Markdown / HTML / LaTeX
                 (matplotlib, jinja2)

The six plug-in pillars (O · F · A · M · I · N) live inside the measuring ring:

O  observability       SPC + SRM + drift detectors      (scipy, river)
F  formal provenance   PROV-O graph + lineage store     (prov, MLflow, DVC)
A  adaptive testing    SPRT + mSPRT anytime-valid       (statsmodels)
M  mixed-effects       MixedLM + MEA                    (statsmodels)
I  iterative synthesis cumulative meta-analysis         (statsmodels)
N  n-fold robustness   cross-validation                 (scikit-learn)

The framework is independent of any particular substrate-under-test; MockSubstrate makes the whole observatory runnable with no external system, and KimeraAdapter plugs in Kimera-SWM via a subprocess boundary.

__version__ module-attribute

__version__ = '0.55.0'

Protocols

ophamin.protocols

The plug-in surfaces of the Ophamin observatory.

Ophamin is built to accept plug-in datasets, plug-in substrate probes, plug-in analytic pillars, and plug-in scenarios. This module declares the protocols each plug-in must satisfy. A new plug-in implements the protocol; nothing inside Ophamin's core has to change.

The protocols are intentionally narrow — each one names a single contract:

SubstrateProbe   the thing being observed (e.g. KimeraAdapter, MockSubstrate)
DatasetConnector a corpus the observatory feeds the substrate from
Pillar           a library-backed analytic that turns cycle results into
                 PillarEvidence (one statistical method per pillar)
ScenarioProtocol a corpus + target + pre-registered claim runner

A protocol is Protocol-typed and runtime_checkable so callers can verify plug-ins at registration time (isinstance(plugin, Pillar)) without inheritance.

Pillar

Bases: Protocol

One analytic pillar — a statistical method that turns observations into PillarEvidence.

A Pillar declares its name + the library it delegates to (library + version go into every signed proof record). Different pillars implement different statistical methods: SPC, SPRT, mixed-effects, etc.

The current six pillars (O / F / A / M / I / N) live under ophamin.measuring.pillars/*; this protocol describes the contract a new pillar must satisfy to be registrable.

As of Move G (2026-05-16) eleven adapter classes in ophamin.measuring.pillars._adapters satisfy this Protocol and register themselves with :data:ophamin.registry.PILLARS at module import time. Out-of-tree pillars register the same way: construct an instance of a :class:PillarBase subclass and call :func:ophamin.registry.register_pillar. Per-pillar compute signatures diverge (sequential testing vs control charts vs cross-validation are different shapes); the Protocol is metadata- backed-by-compute — adapters whose pillar doesn't fit the uniform compute(cycle_results, records) shape raise :class:ophamin.measuring.pillars.base.NonUniformComputeError with a pointer to the canonical per-pillar API.

ScenarioProtocol

Bases: Protocol

A scenario binds a corpus + target + pre-registered claim and produces a signed Empirical Proof Record.

The existing ophamin.measuring.scenarios.base.Scenario abstract class is the canonical implementation. As of 2026-05-16 the framework ships nineteen scenarios across five tiers (Scientific / Engineering / Philosophical / Empirical-deep / Measurement-machinery); see the README scenarios table for the full list.

The pre-registration discipline is preserved across plug-ins: every ScenarioProtocol implementation must produce a claim whose threshold is falsifiable (a value the substrate could fail to meet), before the run.

.. note::

Eleven of the nineteen scenarios are not currently registered in ophamin.measuring.scenarios.__init__.SCENARIOS, which means they are reachable from Python imports but not from the ophamin scenario <name> CLI surface. See docs/ARCHITECTURE_INTENT_VS_REALITY_2026_05_16.md for the gap.

DatasetConnector

Bases: Protocol

A corpus the observatory can stream records from.

The existing ophamin.seeing.corpus.base.Corpus abstract class is the canonical example. Each corpus is content-addressable (its content hash appears in every signed proof record).

SubstrateProbe

Bases: Protocol

A substrate-under-test the observatory can drive.

Any concrete substrate (Kimera, mock, future substrates) implements this contract. The existing ophamin.seeing.substrate.base.SubstrateUnderTest abstract class is the canonical example.

Registry

ophamin.registry

Central plug-in registry.

Closes gap B from docs/ARCHITECTURE_INTENT_VS_REALITY_2026_05_16.md — until this module landed, the four Protocols declared in :mod:ophamin.protocols (SubstrateProbe / DatasetConnector / Pillar / ScenarioProtocol) had no registration surface. Plug-ins were hard-imported into individual scenarios.

The registry exposes one dict per plug-in kind and one register_* function per kind. Scenarios continue to register themselves via :meth:Scenario.__init_subclass__ (Move A); the SCENARIOS dict is re-exported here for one-stop discovery. Pillars are registered by their module's __init__.py-time call to :func:register_pillar. Corpora are looked up via :func:ophamin.seeing.corpus.get_corpus (existing surface).

Every registration is loud-failure:

  • A duplicate pillar_name raises :class:DuplicatePluginError rather than silently overwriting.
  • A plug-in that fails the matching isinstance(p, Protocol) check raises :class:PluginProtocolViolationError — the Protocol declared the contract; an adapter that doesn't satisfy it is a real defect.

Outside callers query the registry via:

>>> from ophamin.registry import PILLARS, list_pillars, get_pillar
>>> p = get_pillar("O.spc")
>>> p.library, p.library_version
('numpy', '1.26.0')

Or via the ophamin pillar list / show CLI surface.

register_pillar

register_pillar(pillar: PillarBase) -> PillarBase

Register one pillar adapter in the central registry.

Returns the pillar (so calls can be expressed as MY_PILLAR = register_pillar(MyPillar()) at module scope).

Raises:

Type Description
PluginProtocolViolationError

if the object doesn't satisfy the :class:Pillar runtime protocol (missing pillar_name / library / library_version / compute).

DuplicatePluginError

if another pillar already registered the same pillar_name. Re-registration of the same object under the same name is idempotent (necessary for module reloads).

Signed-record codecs

Empirical Proof Record

ophamin.measuring.proof.record

The Ophamin Empirical Proof Record — the official result artifact.

One record per verified claim. Two serialisations:

proof.json   canonical, machine-readable, JSON-Schema-validated
PROOF.md     rendered, human-readable

A proof is bulletproof when it is:

  • falsifiable — every claim carries a pre-registered Threshold
  • pre-registered — claim + config + analysis plan hashed BEFORE the run
  • traceable — content-addressed: claim -> config -> substrate -> data -> result
  • reproducible — exact command + environment lock + lineage chain
  • attributed — every statistic names the library + version that produced it
  • tamper-evident — HMAC-signed over the whole record body

The nine sections:

1 Identity          2 Claim            3 Pre-registration
4 Data              5 Evidence         6 Verdict
7 Reproduction      8 Provenance       9 Signature

A REFUTED record is a valid proof — disproving a claim is a result.

EmpiricalProofRecord dataclass

The official Ophamin result artifact — nine sections, content-addressed, signed.

proof_id property

proof_id: str

Content-addressed identifier — SHA-256 over sections 1-8.

sign

sign(key: bytes) -> 'EmpiricalProofRecord'

HMAC-SHA256 sign the record body. Returns self for chaining.

verify_signature

verify_signature(key: bytes) -> bool

True iff the signature matches the current body under key.

validate

validate() -> list[str]

Return a list of problems; an empty list means the record is well-formed.

Enforces the properties that make a proof bulletproof — every one of them, not a subset.

from_dict classmethod

from_dict(data: dict[str, Any]) -> 'EmpiricalProofRecord'

Reconstruct an EmpiricalProofRecord from its to_dict payload.

Mirrors the on-disk JSON shape produced by to_json — the body sections (claim / preregistration / data / evidence / verdict / reproduction / provenance / signature) plus the identity sub-dict. Raises KeyError / ValueError loudly on a malformed payload — no silent fill-defaults; a broken record should fail loud, not deserialize as a partial.

from_json classmethod

from_json(path: str) -> 'EmpiricalProofRecord'

Load a proof record from a JSON file written by to_json.

Claim dataclass

Section 2 — the falsifiable claim, as a five-tuple.

Threshold dataclass

A falsifiable pass/fail boundary — there is no claim without one.

decide

decide(observed: float) -> bool

True iff the observed value satisfies the threshold.

Verdict dataclass

Section 6 — VALIDATED / REFUTED / INCONCLUSIVE against the threshold.

decide classmethod

decide(observed: float, threshold: Threshold, *, inconclusive: bool = False, reasoning: str = '') -> 'Verdict'

Decide the verdict by comparing observed against threshold.

PillarEvidence dataclass

Section 5 — one pillar's measured evidence, attributed to its library.

The cross_check field is constrained to :data:_CROSS_CHECK_VALUES — passing prose into it fires a loud ValueError at construction time. Long-form context belongs in detail (free-form dict) instead.

PreRegistration dataclass

Section 3 — claim + plan hashed BEFORE the run.

preregistered_at must precede the record's created_at; validate enforces it. Build this object before the experiment runs.

DatasetRef dataclass

Section 4 — one real dataset, content-addressed.

Reproduction dataclass

Section 7 — exact reproduction command, environment lock, lineage chain.

ophamin.measuring.proof.codec

Format codec for :class:EmpiricalProofRecord — the single canonical load / validate / verify / ingest surface.

The proof-record component dataclasses already declare to_dict / from_dict round-trip pairs, and :class:EmpiricalProofRecord ships to_json / from_json file shortcuts. This module collects them into one loud-failure interface that bundles:

  • JSON-Schema validation against proof/schema.json;
  • the structural record.validate() checklist (falsifiable + pre-registered + traceable + reproducible + attributed);
  • optional HMAC-SHA256 signature verification under a caller-provided key;
  • one-call ingest that runs all three and raises loud on the first failure;
  • directory-walking iter_proofs / list_proofs so the proof corpus on disk has a first-class Python interface.

Per the framework's no-fallback rule: every failure mode raises a typed :class:ProofCodecError subclass with a descriptive message; the codec does not return None or empty dicts on error and does not swallow exceptions.

Read alongside :class:EmpiricalProofRecord itself (src/ophamin/measuring/proof/record.py) and the JSON Schema (src/ophamin/measuring/proof/schema.json).

SCHEMA_VERSION module-attribute

SCHEMA_VERSION = '1.0'

dump

dump(record: EmpiricalProofRecord, path: str | Path, *, indent: int = 2) -> Path

Write record to path as canonical JSON. Returns the path.

Creates the parent directory if it doesn't already exist (mirrors the convenience pattern of pathlib.Path.write_text callers typically wrap).

Raises :class:OSError if the write fails — codec does NOT swallow file-system errors. Use the higher-level CLI / orchestration layer if structured error handling is wanted.

load

load(path: str | Path) -> EmpiricalProofRecord

Load + reconstruct an :class:EmpiricalProofRecord from path.

Raises :class:ProofDecodeError on file-system errors, malformed JSON, or a structurally incomplete payload. The chained exception preserves the underlying error for forensic debugging.

validate

validate(path: str | Path, *, key: bytes | None = None) -> ValidationReport

Run schema + record + (optional) signature validation in one call.

Returns a :class:ValidationReport capturing every layer's result. Does NOT raise on any validation failure — caller inspects the report's all_ok property and schema_errors / record_problems tuples to decide what to do.

Use :func:ingest for the raise-on-any-failure variant.

The JSON-Schema check runs first; if the schema is broken, the structural record.validate is skipped (a malformed-at-schema payload can't be safely reconstructed into a record). When schema-ok, the record is loaded and record.validate is invoked; if a key was provided, signature verification runs too. The signature_ok field is None when no key was provided (i.e. the check was skipped), True / False when a key was provided.

verify_signature

verify_signature(path: str | Path, key: bytes) -> bool

Load the record at path + verify its HMAC-SHA256 signature.

Returns True iff the signature matches the record body under key. False if signature is empty or doesn't match.

Raises :class:ProofDecodeError if the file itself can't be loaded. Does NOT raise on signature mismatch — caller decides whether to escalate (use :func:ingest with strict_signature=True for the loud-failure variant).

ingest

ingest(path: str | Path, *, key: bytes | None = None, strict_signature: bool = False, require_schema_version: str | None = SCHEMA_VERSION) -> EmpiricalProofRecord

Single-call load + full-validate + optional signature-verify.

The boundary function for accepting third-party proof records. After a successful call, the returned :class:EmpiricalProofRecord is guaranteed:

  • structurally well-formed (JSON-Schema validated),
  • record-validate-clean (no internal contradictions),
  • schema-version matches require_schema_version (unless that's None, which opts out of the version gate),
  • signature-verified IFF strict_signature=True was passed AND a key was provided.

On any failure, raises the matching :class:ProofCodecError subclass with a descriptive message. The caller never has to inspect a partial / fallback record.

Parameters:

Name Type Description Default
path str | Path

file path to the proof JSON.

required
key bytes | None

optional HMAC-SHA256 key for signature verification.

None
strict_signature bool

when True, require key to be provided AND the signature to verify. Default False — signature is checked when key is provided but verification failure is not fatal (matches the validate shape).

False
require_schema_version str | None

required schema version. Default is the current :data:SCHEMA_VERSION; pass None to accept any version (e.g. for migration tooling).

SCHEMA_VERSION

Raises:

Type Description
ProofSchemaError

JSON-Schema validation failed.

ProofValidationError

structural record.validate failed.

ProofSchemaVersionMismatchError

schema_version didn't match require_schema_version.

ProofSignatureError

strict_signature=True and either no key was provided or signature verification failed.

ProofDecodeError

file couldn't be read or JSON couldn't be decoded (raised by underlying :func:load).

list_proofs

list_proofs(directory: str | Path, *, key: bytes | None = None) -> tuple[ProofListEntry, ...]

Walk directory recursively and return one summary entry per JSON.

A file that fails to decode produces an entry with error set and the other content fields None — the walk does NOT stop on a bad file. Use :func:ingest against an individual path when you need loud-failure semantics for a single record.

Audit Record

ophamin.auditing.audit_record

AuditRecord — the signed, content-addressable artefact of one audit run.

Parallel to EmpiricalProofRecord but descriptive by default — audits don't require a falsifiable claim because the value is in the findings distribution itself, not in passing a threshold. A separate threshold-mode wrapper can pre-register pass/fail criteria for CI gating; that's a follow-on.

Nine logical sections, mirroring the proof record shape so the two can be processed by the same downstream tooling (reporting, drift, etc.):

  1. Identity ophamin version + commit, captured_at, schema version
  2. Target path being audited + its content hash (for forensics)
  3. Pillars which pillars ran, which were unavailable, versions
  4. Findings the union of every pillar's findings (already in PillarResult, but flattened here for cross-pillar hotspot detection)
  5. Summary aggregate counts + severity histogram + file hotspots
  6. (no verdict) audits are descriptive; if a claim is wanted, wrap this record in an Empirical Proof Record with a threshold on a chosen statistic
  7. Reproduction command + env-lock (later)
  8. Provenance (optional) PROV-O graph of the run
  9. Signature HMAC-SHA256 over the body

AuditRecord dataclass

One audit run's full artefact — signed, content-addressable.

As of schema audit/1.1 (Move L, 2026-05-16), an AuditRecord MAY carry an optional :class:PreRegistration + chosen statistic metric + :class:Verdict, turning the descriptive record into a falsifiable artefact for CI gating. Records written under schema audit/1.0 (no pre-registration fields) load cleanly under the v1.1 codec — the optional fields default to None.

audit_id property

audit_id: str

Content-addressed identifier — SHA-256 over the body.

from_dict classmethod

from_dict(data: dict[str, Any]) -> 'AuditRecord'

Reconstruct an AuditRecord from its to_dict payload.

Accepts both schema audit/1.0 and audit/1.1 payloads. v1.0 records have no preregistration / verdict fields; v1.1 records may have one or both. Loud-fails on malformed shapes rather than silent partial deserialisation.

attach_pre_registration

attach_pre_registration(*, claim: Any, observed_value: float, metric: str = 'total_findings', analysis_plan: str = 'audit-side pre-registration: gate on a chosen audit statistic') -> 'AuditRecord'

Stamp an in-place pre-registration + verdict onto this record.

Per Move L's full universalization of the pre-registration discipline: convert this descriptive audit into a falsifiable artefact by attaching a Claim's threshold + a decided Verdict. Returns self for chaining; bumps the record's schema_version to audit/1.1 if it wasn't already there.

Sign() must be re-called after attach to refresh the signature (the body changed, so the old signature is invalid).

from_json classmethod

from_json(path: str | Path) -> 'AuditRecord'

Load an AuditRecord from a JSON file written by :meth:to_json.

wrap_as_proof

wrap_as_proof(*, claim: Any, observed_value: float, pillar_name: str = 'audit', statistic_name: str = 'total_findings', library: str = 'ophamin', library_version: str = '', analysis_plan: str = 'wrap-audit-as-proof: pre-register a threshold over an audit statistic for CI gating', sign_key: bytes | None = None) -> Any

Wrap this AuditRecord into a pre-registered EmpiricalProofRecord.

Per Move I: audits are descriptive by default, but a caller that wants CI gating (e.g. total_findings <= 50) can wrap the record in a proof record that carries the falsifiable claim + threshold. The wrapping is lossless — the audit's forensic detail (target path + content hash + per-pillar findings + summary) is kept in the proof's reproduction + evidence sections.

Parameters:

Name Type Description Default
claim Any

an :class:ophamin.measuring.proof.Claim whose threshold is the gate (e.g. Threshold("total_findings", "<=", 50)).

required
observed_value float

the statistic to evaluate the claim against (typically record.summary.total_findings or a per-severity count).

required
pillar_name str

PillarEvidence pillar identifier in the wrapped proof. Default "audit".

'audit'
statistic_name str

PillarEvidence statistic name. Default "total_findings".

'total_findings'
library str

PillarEvidence library attribution. Default "ophamin" since the audit aggregation IS Ophamin's code.

'ophamin'
library_version str

PillarEvidence library version. Default empty — caller fills if known.

''
analysis_plan str

PreRegistration analysis plan. Default explains the wrap-shape.

'wrap-audit-as-proof: pre-register a threshold over an audit statistic for CI gating'
sign_key bytes | None

HMAC-SHA256 sign key. If None, the proof is returned unsigned (caller's responsibility to sign before persisting).

None

Returns:

Type Description
Any

A signed (or unsigned, if sign_key=None)

Any

class:EmpiricalProofRecord.

to_markdown

to_markdown(path: str | None = None) -> str

Render the audit as a human-readable Markdown report.

AuditSummary dataclass

Cross-pillar aggregate — every pillar's findings rolled up.

from_dict classmethod

from_dict(data: dict[str, Any]) -> 'AuditSummary'

Reconstruct an AuditSummary from its to_dict payload.

top_files round-trips as a list of [path, count] pairs (JSON doesn't carry tuples natively); we coerce back into the (str, int) tuple shape this dataclass declares.

ophamin.auditing.codec

Format codec for :class:AuditRecord — Move H parallel to Move B's proof codec.

Mirrors :mod:ophamin.measuring.proof.codec exactly so audit records get the same load / validate / verify / ingest treatment proof records do. The differences:

  • No JSON-Schema validation today (audit records don't ship a schema.json alongside; structural validation is via the AuditRecord.from_dict parser).
  • No threshold/verdict — audits are descriptive by default. The validation layer focuses on signature + body-roundtrip integrity + schema_version compatibility.

Per the framework's no-fallback rule, every failure raises a typed :class:AuditCodecError subclass with a descriptive message.

SCHEMA_VERSION module-attribute

SCHEMA_VERSION = 'audit/1.1'

dump

dump(record: AuditRecord, path: str | Path, *, indent: int = 2) -> Path

Write record to path as canonical JSON. Returns the path.

Creates the parent directory if it doesn't already exist. Raises :class:OSError if the write fails — codec does NOT swallow file-system errors.

load

load(path: str | Path) -> AuditRecord

Load + reconstruct an :class:AuditRecord from path.

Raises :class:AuditDecodeError on file-system errors, malformed JSON, or a structurally incomplete payload. The chained exception preserves the underlying error for forensic debugging.

validate

validate(path: str | Path, *, key: bytes | None = None) -> AuditValidationReport

Run structural + (optional) signature validation in one call.

Returns a :class:AuditValidationReport. Does NOT raise on any validation failure — caller inspects the report's all_ok property + record_problems tuple to decide what to do. Use :func:ingest for the raise-on-any-failure variant.

The structural check (via :meth:AuditRecord.from_dict + the in-module _structural_problems helper) runs first; if the record can't be loaded, the report carries the decode error as a single record-problem string. When loaded, the in-module shape check augments with cross-section consistency (e.g. pillars in record vs summary).

verify_signature

verify_signature(path: str | Path, key: bytes) -> bool

Load the record at path + verify its HMAC-SHA256 signature.

Returns True iff the signature matches the record body under key. False if signature is empty or doesn't match.

Raises :class:AuditDecodeError if the file itself can't be loaded.

list_audits

list_audits(directory: str | Path, *, key: bytes | None = None) -> tuple[AuditListEntry, ...]

Walk directory recursively; emit one summary entry per JSON.

A file that fails to decode produces an entry with error set and the other content fields None. Mirrors :func:ophamin.measuring.proof.codec.list_proofs.

Campaign Record

ophamin.campaign

CampaignRecord + the 6-phase composite-run orchestrator (Move F).

Closes Deficit 2 from docs/ARCHITECTURE_EXTENDED_AUDIT_2026_05_16.md — the "6 phases" the owner named are the six wheels of Ophamin's architecture operating as a single coordinated pass against a substrate:

  1. seeing — discover the substrate's surface
  2. measuring — run the requested scenarios; collect signed proof records
  3. comparing — synthesize the measuring output into a campaign summary; detect verdict flips
  4. instrumenting — collect per-cycle resource cost (when the substrate was wrapped in InstrumentedSubstrate)
  5. auditing — static-analysis sweep over the substrate's source (when a source-code path is available)
  6. reporting — collate every preceding phase's output into one rolled-up Markdown report

Each phase produces a :class:CampaignPhase aggregate; the :class:CampaignRecord collects them into a signed, content-addressed aggregate. A phase can be ok / skipped / failed; skipped is the sanctioned outcome when the substrate doesn't expose what a phase needs (e.g. auditing against a MockSubstrate is skipped because there's no source code to audit).

The orchestrator never silently swallows phase failures — a failed phase carries its error message into the record so the operator sees exactly what broke. This is the framework's "loud-failure" stance applied at the campaign level.

CAMPAIGN_SCHEMA_VERSION module-attribute

CAMPAIGN_SCHEMA_VERSION = '2.0'

CANONICAL_PHASE_ORDER module-attribute

CANONICAL_PHASE_ORDER: tuple[str, ...] = ('seeing', 'measuring', 'comparing', 'instrumenting', 'auditing', 'reporting')

CampaignPhase dataclass

One wheel's contribution to a composite run.

Three possible terminal status values:

  • "ok" — phase completed; artifact_paths + summary carry the output.
  • "skipped" — phase didn't apply (e.g. auditing against a Mock substrate); error carries the reason.
  • "failed" — phase raised; error carries the exception string. The campaign continues to the next phase (loud-failure at the campaign level, not at the per-phase level — operator sees every phase's outcome).

CampaignRecord dataclass

Signed, content-addressed aggregate of one full-pass run.

Schema 2.0 (current) adds two strictly-additive fields:

  • corrected_verdicts{claim_id → corrected_verdict} after multiplicity correction (FWER or FDR). Empty dict when no correction was applied or when no records carried a p_value.
  • multiplicity_correction_method"holm" / "bh" / "none". The method the writer used when populating corrected_verdicts.

Schema 1.0 records remain readable: missing additive fields default to empty dict / "none" respectively. Signature verification is version-aware — :meth:_body includes the additive fields only when schema_version != "1.0", so a 1.0 signature still re-canonicalises bit-equal to the original wire form.

campaign_id property

campaign_id: str

Content-addressed identifier — SHA-256 over the body.

run_campaign

run_campaign(*, substrate: SubstrateUnderTest, target_name: str | None = None, target_git_commit: str | None = None, scenarios: list[type[Scenario]] | None = None, enable_phases: set[str] | None = None, out_dir: str | Path = 'campaigns/latest', sign_key: bytes = DEFAULT_SIGN_KEY, fwer_method: str = 'holm', fwer_alpha: float = 0.05) -> CampaignRecord

Run the six wheels in canonical order; emit a signed CampaignRecord.

Parameters:

Name Type Description Default
substrate SubstrateUnderTest

the substrate the measuring phase will run scenarios against. Required.

required
target_name str | None

human-facing name for the target (default: the substrate's name attribute).

None
target_git_commit str | None

the target's git commit hash (default: the substrate's git_commit() return value).

None
scenarios list[type[Scenario]] | None

list of Scenario classes to run in the measuring phase. Default: every default-instantiable scenario in :data:SCENARIOS.

None
enable_phases set[str] | None

set of phase names to run. Default: all six.

None
out_dir str | Path

directory under which per-phase artifacts are written.

'campaigns/latest'
sign_key bytes

HMAC-SHA256 key for signing the final record.

DEFAULT_SIGN_KEY
fwer_method str

multiplicity-correction method to apply during the comparing phase. One of :data:ophamin.comparing.fwer.SUPPORTED_METHODS ("holm" / "bh" / "none"). The default is "holm" — strict FWER control via Holm-Bonferroni. New in schema 2.0 (RFC 0002 Phase E2).

'holm'
fwer_alpha float

family-wise / FDR threshold used when applying the correction. Default 0.05.

0.05

Returns:

Type Description
CampaignRecord

A signed :class:CampaignRecord with one

CampaignRecord

class:CampaignPhase per executed phase, plus, when the

CampaignRecord

comparing phase ran, the schema-2.0

CampaignRecord

corrected_verdicts mapping + multiplicity_correction_method

CampaignRecord

populated from the FWER pass.

The orchestrator NEVER raises on a per-phase failure — it captures the error string into the phase's error field and continues to the next phase. The caller inspects record.any_failed to surface to a non-zero exit code if appropriate.

dump_campaign

dump_campaign(record: CampaignRecord, path: str | Path) -> Path

Write a CampaignRecord to disk as canonical JSON. Returns the path.

load_campaign

load_campaign(path: str | Path) -> CampaignRecord

Load a CampaignRecord from disk.

Regression Alert Record

ophamin.comparing.regression_alert

Regression-alert daemon (Move J) — closes gap F from the prior audit.

Detects verdict regressions across two snapshots of a proof corpus (typically: proofs/ at the prior Kimera commit vs proofs/ at the new commit). A "regression" is a scenario whose verdict moved from VALIDATED (or INCONCLUSIVE) to REFUTED — the substrate started failing a claim it previously satisfied. The detector also flags the inverse ("recovery": REFUTED → VALIDATED) and the lateral cases (different verdict that's not the substrate's load-bearing regression direction).

The pipeline:

  1. Snapshot a proof corpus at commit A (e.g. via :func:scan_proof_directory).
  2. Snapshot the proof corpus at commit B.
  3. Run :func:compute_regression_alert on the pair.
  4. Inspect the resulting :class:RegressionAlert — list of :class:VerdictTransition rows + headline counts.

The pairing key is the scenario's stable identifier (the proof's filename family or — when present — the underlying scenario name via the proof's claim-statement signature). For two proofs of the same family at two different substrate commits to be paired, both must carry the same family heuristic; mis-paired entries are surfaced as unmatched_in_a / unmatched_in_b for operator inspection.

CLI:

ophamin watch-proofs --before --after [--out ]

Output is a signed :class:RegressionAlertRecord (HMAC-SHA256 + content-addressed alert_id), mirroring the shape of every other Ophamin artifact. A REGRESSION-class alert is exit-code 1; a quiet (no-change) alert is exit-code 0; a recovery-class alert is exit-code 0 with a notable summary line.

REGRESSION_ALERT_SCHEMA_VERSION module-attribute

REGRESSION_ALERT_SCHEMA_VERSION = 'regression-alert/1.0'

RegressionAlertRecord dataclass

Signed, content-addressed artifact of one before/after comparison.

Substrate base

ophamin.seeing.substrate.base

The substrate-under-test abstraction.

Ophamin is independent of any particular system. Whatever it tests is a SubstrateUnderTest (SUT): something that can be reset, run for one cycle on a stimulus, and asked for its git commit and state. MockSubstrate implements this with no dependencies (so the framework is fully runnable on its own); KimeraAdapter implements it over a subprocess boundary to Kimera-SWM.

The cycle boundary is deliberate. Per the leak-free probe shape established empirically (a fresh interpreter per cycle removes process-level state carry), run_cycle is the unit of measurement and reset is honoured between runs.

SubstrateUnderTest

Bases: ABC

Abstract system under test. Implement this to plug a system into Ophamin.

git_commit abstractmethod

git_commit() -> str

Return the substrate's source revision.

This is the data_git_commit_id end of the provenance bridge: every recorded run is tethered to the exact substrate revision that produced it. Return "" only if the substrate genuinely has no version anchor.

reset abstractmethod

reset() -> None

Return the substrate to a clean initial state between runs.

run_cycle abstractmethod

run_cycle(stimulus: Any, params: dict[str, Any] | None = None) -> CycleResult

Exercise the substrate for exactly one cycle on stimulus.

params carries the swept configuration for this run. Implementations must not silently degrade: if the cycle cannot run, return a CycleResult with success=False and a populated error, or raise — never fabricate a plausible-looking result.

run_batch

run_batch(stimuli: list[Any], params: dict[str, Any] | None = None) -> list[CycleResult]

Exercise the substrate over a batch of stimuli.

The default implementation simply loops run_cycle — correct, but one boundary crossing per cycle. Adapters that can run a whole batch inside a single process (the leak-tolerant density path) should override this; cycle_index is renumbered sequentially across the batch.

capture_state

capture_state() -> dict[str, Any]

Return a serialisable snapshot of substrate state (for provenance).

Default is empty; substrates with inspectable state should override.

metadata

metadata() -> dict[str, Any]

Static descriptive metadata about this substrate.

CycleResult dataclass

The outcome of one substrate cycle.

raw is whatever the substrate emitted, untouched. success and halt_mode are the two cross-substrate fields every adapter must fill. A pre-built metric_bundle may be attached by the adapter; otherwise to_metric_bundle does best-effort extraction from raw.

to_metric_bundle

to_metric_bundle() -> MetricBundle

Return the attached bundle, or build one with best-effort extraction.

The default extraction recognises a small set of conventional field names. Adapters that know their substrate should attach an explicit metric_bundle rather than relying on this.

ophamin.seeing.substrate.mock

MockSubstrate — a self-contained substrate under test.

This is what makes Ophamin runnable and testable with no external system. It is a deterministic, seedable stand-in that produces plausible cycle results: a phi-like cognitive signal that drifts as state accumulates, an energy gauge that depletes, latency timers, and a tunable collapse mode so the diagnostics have something to find.

It is not a model of any real substrate — it exists so the framework's pillars and orchestration can be exercised and verified end-to-end. Real systems plug in through their own SubstrateUnderTest adapter (see kimera_adapter).

MockSubstrate

Bases: SubstrateUnderTest

A deterministic, seedable substrate stand-in.

Behaviour responds to swept parameters so the framework can be exercised:

injection_rate       raises ``phi``, lowers cross-modal overlap & energy
immune_threshold     low + high injection -> occasional overwhelm collapse
variant              "treatment"-like labels add a small positive effect
entropy_coefficient  in a ``collapse_cell``, low entropy triggers collapse
cell                 topological cell id (for the kernel-coupling probe)

State (cycle count, energy, accumulated phi) carries across cycles and is cleared by reset — the seed makes the whole sequence reproducible.

ophamin.seeing.substrate.kimera_adapter

KimeraAdapter — plug Kimera-SWM into Ophamin as a multi-component substrate.

This is the central Kimera-coupling point in the framework — it adapts the substrate-under-test surface so the rest of Ophamin (measuring/ / comparing/ / auditing/ / reporting/) operates against the abstract SubstrateUnderTest protocol. A small number of seeing-wheel-internal helpers (seeing/discovery, seeing/wiring, seeing/telemetry) also reach into Kimera shapes — those are the same conceptual layer as KimeraAdapter itself. Models Kimera-SWM as what it is: a multi-component entity, not a single cognitive cycle.

An experiment targets either the whole entity (target="entity" — the integrated Takwin cycle) or a named component ("walker", "gwf", "rosetta", "arachne", "ouroboros", "pentecost", "piovra", "astrolabe" …) — each invoked through its own verified entry point.

Two modes:

  • mode="subprocess" — a fresh interpreter per cycle. Leak-free, slow; the precision path.
  • mode="batch" — one interpreter, the component constructed once, the whole batch looped in-process. Fast; the density path. State accumulates across the batch, which for most components is the substrate working as designed (memory-as-deformation), not a leak.

Performance is measured, never assumed: measure_throughput runs a bounded batch and reports real cycles/sec on this vessel. probe verifies which targets are actually reachable in the connected repo.

KimeraAdapter

Bases: SubstrateUnderTest

Subprocess adapter for the Kimera-SWM substrate — entity or any component.

write_runner_template staticmethod

write_runner_template(path: str | Path) -> Path

Dump the bundled runner so it can be edited and reused.

probe

probe() -> dict[str, Any]

Verify which targets are reachable in the connected repo.

Returns a structured report — run this before wiring scenarios. It is how the adapter checks the substrate, not the docs.

measure_throughput

measure_throughput(stimuli: list[Any], params: dict[str, Any] | None = None) -> dict[str, Any]

Measure real cycles/sec for this target — performance is measured, not assumed.

Runs the stimuli as one in-process batch and times it. The result is the empirical basis for choosing subprocess-vs-batch and for a throughput proof record — there is no assumed performance figure anywhere.

Corpus base

ophamin.seeing.corpus.base

Massive-dataset corpus layer — base abstraction.

A Corpus locates a downloaded open-source dataset on disk, content-addresses it, counts its records, and streams them as CorpusRecord objects — so a catastrophic-testing scenario can feed real data through the substrate in concentrated batches.

The four connectors (connectors.py):

EnronCorpus         ~500k real executive emails        — organisational dissonance
LinuxKernelCorpus   ~1.4M commit messages              — logic / topology siege
CyberPayloadCorpus  Metasploit modules + injection sets — concentrated immune siege
FloresCorpus        FLORES-200, 200 parallel languages — Rosetta scaling limit

Content hashes and record counts are computed once and cached to disk (.ophamin_<name>_content_hash / _count) so a 1.7 GB archive is not re-hashed on every run.

Corpus

Bases: ABC

A downloaded open-source dataset, content-addressed and streamable.

is_available abstractmethod

is_available() -> bool

True iff the raw data is present on disk.

records abstractmethod

records() -> Iterator[CorpusRecord]

Stream every record. Must be a generator — corpora do not fit in memory.

content_hash

content_hash() -> str

Content-addressed hash of the corpus, cached in-memory and on disk.

count

count() -> int

Total record count, cached in-memory and on disk.

sample

sample(n: int, seed: int = 0) -> list[CorpusRecord]

A deterministic reservoir sample of n records (single streaming pass).

chunks

chunks(size: int, limit: int | None = None) -> Iterator[list[CorpusRecord]]

Yield records in batches of size — concentrated-batch density feeding.

limit caps the total number of records emitted across all batches.

dataset_ref

dataset_ref() -> 'DatasetRef'

Produce the proof-record DatasetRef for this corpus.

CorpusRecord dataclass

One item from a corpus — an email, a commit message, a payload, a sentence.

Scenario base

ophamin.measuring.scenarios.base

The catastrophic-scenario layer.

A Scenario binds a real corpus + a substrate target + a pre-registered falsifiable claim. It streams the corpus through the substrate, scores the run, and emits a signed EmpiricalProofRecord.

The harness is substrate-agnostic — it runs identically against MockSubstrate (tests) or KimeraAdapter (real catastrophic runs). Pre-registration is captured before the run; the proof record is content-addressed and signed.

Scenario registration

Every concrete subclass of :class:Scenario that sets a name attribute distinct from the base sentinel "scenario" is automatically registered in the module-level :data:SCENARIOS mapping via the :meth:Scenario.__init_subclass__ hook. There is no manual editing of an __init__.py dict required; the registry is built by class-definition side effect.

Registration is loud-failure:

  • A duplicate name across two subclasses raises :class:DuplicateScenarioNameError at class-definition time.
  • A subclass that sets name = "scenario" (the unchanged base default) raises :class:ScenarioNameNotOverriddenError.
  • A subclass that opts out via register=False (e.g. an abstract intermediate parent in a class hierarchy) is skipped silently. This is the only sanctioned skip path.

Third-party / out-of-tree scenarios reach the same registry by simply inheriting from :class:Scenario in their own package; importing their module fires the registration hook.

DEFAULT_SIGN_KEY module-attribute

DEFAULT_SIGN_KEY = b'ophamin-scenario-proof-key'

Scenario

Bases: ABC

Binds corpus + target + pre-registered claim -> a signed proof record.

Every concrete subclass declares a metadata block (name, tier, family, goal, explanation, and optionally method + falsification_consequence) that classifies the experiment and explains its intent without requiring the reader to chase docstrings. The metadata is validated at class-definition time by :meth:__init_subclass__ (loud-failure on omission) and surfaces into every signed EmpiricalProofRecord produced by the scenario.

__init_subclass__

__init_subclass__(register: bool = True, **kwargs: object) -> None

Auto-register concrete subclasses in :data:SCENARIOS.

Skips registration when register=False (abstract intermediate parents, test-internal scenarios). Otherwise:

  • raises :class:ScenarioNameNotOverriddenError if the subclass kept the base sentinel name;
  • raises :class:ScenarioMetadataMissingError if any of tier / family / goal / explanation is unset or empty;
  • raises :class:DuplicateScenarioNameError if another subclass already registered the same name.

Re-registration of the same class object under the same name is idempotent — this is necessary so module reloads (e.g. test fixtures, importlib.reload) don't trip the duplicate guard.

build_claim abstractmethod

build_claim() -> Claim

The pre-registered falsifiable claim this scenario tests.

score abstractmethod

score(cycle_results: list[CycleResult], records: list[CorpusRecord]) -> ScenarioScore

Read the completed run into an observed value + pillar evidence.

field_contract

field_contract() -> ScenarioFieldContract | None

The OrchestratorResult fields this scenario depends on.

Default None means no contract — scenarios that don't override this run exactly as before (back-compat). Scenarios that DO override get loud-failure on the first cycle if a required field is missing or has the wrong type. This catches Kimera-side renames at experiment setup time instead of silently breaking downstream.

Returning a contract is purely additive — the scenario still reads cycle.raw["..."] ad-hoc in :meth:score. The contract is the gate, not the projection.

select_records

select_records(corpus: Corpus) -> Iterator[CorpusRecord]

Which corpus records to use — default is the corpus stream; override to filter.

run

run(substrate: SubstrateUnderTest, *, data_root: str | 'Path' | None = None, sign_key: bytes = DEFAULT_SIGN_KEY) -> EmpiricalProofRecord

Run the scenario end-to-end and return a signed Empirical Proof Record.

ScenarioScore dataclass

A scenario's read of a completed run — the observed value + the evidence.

Tier

Bases: str, Enum

The experimentation tier a scenario lives in.

Tiers carry epistemic shape, not just bookkeeping:

  • SCIENTIFIC — claims about substrate behaviour (does the substrate do X under condition Y?).
  • ENGINEERING — claims about substrate cost (does X stay under threshold T?).
  • PHILOSOPHICAL — claims about substrate self-model (does the substrate respond differently to self-referential vs neutral input?).
  • EMPIRICAL_DEEP — substrate-physics characterisation scenarios that target Kimera's prime apparatus / Φ / cross-channel behaviour and mirror Family A-V claims in Kimera's EMPIRICAL_VALIDATION.md.
  • MEASUREMENT_MACHINERY — validation of the upstream libraries Ophamin itself depends on (e.g. CRDT laws against pycrdt + y-py as cross-check oracle).

Inheriting from str makes a Tier serialise as its value string in JSON; the JSON proof-record schema sees a plain string, not a Python-specific enum encoding.

Audit pillar base

ophamin.auditing.base

The audit-pillar contract — Finding, FindingSeverity, PillarResult, AuditPillar.

A pillar wraps one external static-analysis tool. The contract is small:

  • name and tool_name identify the pillar (e.g. "ruff", "bandit")
  • is_available() reports whether the underlying binary is installed
  • run(target_path) returns a PillarResult carrying the findings + raw output

Findings are normalised across tools — every tool's output is parsed into the same Finding dataclass — so downstream code (aggregation, reporting, threshold-mode claims) doesn't need to know which pillar produced what.

AuditPillar

Bases: ABC

Wraps one external static-analysis tool as an audit pillar.

A subclass implements tool_binary (the CLI name to look up on PATH), tool_version (a way to ask the tool its version), and run (the actual invocation + parse). Tool absence is reported as status="unavailable" — never silently skipped.

resolved_binary classmethod

resolved_binary() -> str | None

Resolve the tool binary — venv-local first, then PATH.

is_available classmethod

is_available() -> bool

Is the wrapped tool resolvable (venv-local OR on PATH)?

tool_version

tool_version(timeout_s: float = 10.0) -> str

Best-effort <tool> --version capture; empty string on failure.

unavailable_result

unavailable_result(target_path: str) -> PillarResult

Standard unavailable result for when the tool isn't installed.

run abstractmethod

run(target_path: str | Path, **kwargs: Any) -> PillarResult

Run the tool against target_path and return a PillarResult.

Pillars MUST handle missing-tool cleanly via unavailable_result and runtime failures via error_result. Never silently swallow a failure.

Finding dataclass

One static-analysis finding, normalised across tools.

Every field except path and message may be empty if the producing tool doesn't carry it — but the dataclass shape is stable so downstream code can rely on it.

FindingSeverity

Bases: str, Enum

Normalised severity across heterogeneous tools.

Each pillar maps its tool's native severity scale onto these five buckets; the mapping is documented per-pillar.

PillarResult dataclass

One pillar's full output.

severity_histogram

severity_histogram() -> dict[str, int]

{severity_value: count} — bucket counts across all findings.

per_file_count

per_file_count(top_n: int = 10) -> list[tuple[str, int]]

Top-N files by finding count.