Public API reference¶
Auto-generated from the source docstrings via
mkdocstrings. Every public symbol
listed below is part of Ophamin's stable surface; breaking changes
follow the semver promise.
Top-level¶
ophamin ¶
Ophamin — an empirical observatory around Kimera-SWM.
The name is the angelic order Ophanim (wheels-within-wheels covered with eyes — Ezekiel 1:18). Architecturally, Ophamin is a dyson sphere around Kimera: Kimera sits at the centre emitting; Ophamin envelops, senses, and returns measurement to the operator. Not a tool next to Kimera — a structure around it.
The structure has six wheels, in two concentric triads:
Outer (empirical) triad:
seeing Wheel 1 — how the observatory senses Kimera and the world
(substrate, corpus, discovery)
measuring Wheel 2 — pre-registered measurement engines + plug-in pillars
(proof, scenarios, metrics, pillars.{observability, adaptive,
effects, synthesis, robustness, diagnostics})
comparing Wheel 3 — cross-Kimera-commit retrospection
(drift, provenance, orchestration)
Inner (engineering) triad:
instrumenting Wheel 4 — per-cycle CPU / RSS / page-fault sampling
(psutil, opentelemetry, py-spy, memray)
auditing Wheel 5 — orchestrated static-analysis tools
(ruff, bandit, mypy, pip-audit)
reporting Wheel 6 — render results to Markdown / HTML / LaTeX
(matplotlib, jinja2)
The six plug-in pillars (O · F · A · M · I · N) live inside the
measuring ring:
O observability SPC + SRM + drift detectors (scipy, river)
F formal provenance PROV-O graph + lineage store (prov, MLflow, DVC)
A adaptive testing SPRT + mSPRT anytime-valid (statsmodels)
M mixed-effects MixedLM + MEA (statsmodels)
I iterative synthesis cumulative meta-analysis (statsmodels)
N n-fold robustness cross-validation (scikit-learn)
The framework is independent of any particular substrate-under-test;
MockSubstrate makes the whole observatory runnable with no external
system, and KimeraAdapter plugs in Kimera-SWM via a subprocess boundary.
Protocols¶
ophamin.protocols ¶
The plug-in surfaces of the Ophamin observatory.
Ophamin is built to accept plug-in datasets, plug-in substrate probes, plug-in analytic pillars, and plug-in scenarios. This module declares the protocols each plug-in must satisfy. A new plug-in implements the protocol; nothing inside Ophamin's core has to change.
The protocols are intentionally narrow — each one names a single contract:
SubstrateProbe the thing being observed (e.g. KimeraAdapter, MockSubstrate)
DatasetConnector a corpus the observatory feeds the substrate from
Pillar a library-backed analytic that turns cycle results into
PillarEvidence (one statistical method per pillar)
ScenarioProtocol a corpus + target + pre-registered claim runner
A protocol is Protocol-typed and runtime_checkable so callers can verify
plug-ins at registration time (isinstance(plugin, Pillar)) without
inheritance.
Pillar ¶
Bases: Protocol
One analytic pillar — a statistical method that turns observations into PillarEvidence.
A Pillar declares its name + the library it delegates to (library + version go into every signed proof record). Different pillars implement different statistical methods: SPC, SPRT, mixed-effects, etc.
The current six pillars (O / F / A / M / I / N) live under
ophamin.measuring.pillars/*; this protocol describes the contract a
new pillar must satisfy to be registrable.
As of Move G (2026-05-16) eleven adapter classes in
ophamin.measuring.pillars._adapters satisfy this Protocol and
register themselves with :data:ophamin.registry.PILLARS at module
import time. Out-of-tree pillars register the same way:
construct an instance of a :class:PillarBase subclass and call
:func:ophamin.registry.register_pillar. Per-pillar compute
signatures diverge (sequential testing vs control charts vs
cross-validation are different shapes); the Protocol is metadata-
backed-by-compute — adapters whose pillar doesn't fit the uniform
compute(cycle_results, records) shape raise
:class:ophamin.measuring.pillars.base.NonUniformComputeError
with a pointer to the canonical per-pillar API.
ScenarioProtocol ¶
Bases: Protocol
A scenario binds a corpus + target + pre-registered claim and produces a signed Empirical Proof Record.
The existing ophamin.measuring.scenarios.base.Scenario abstract class
is the canonical implementation. As of 2026-05-16 the framework ships
nineteen scenarios across five tiers (Scientific / Engineering /
Philosophical / Empirical-deep / Measurement-machinery); see the
README scenarios table for the full list.
The pre-registration discipline is preserved across plug-ins: every ScenarioProtocol implementation must produce a claim whose threshold is falsifiable (a value the substrate could fail to meet), before the run.
.. note::
Eleven of the nineteen scenarios are not currently registered in
ophamin.measuring.scenarios.__init__.SCENARIOS, which means
they are reachable from Python imports but not from the
ophamin scenario <name> CLI surface. See
docs/ARCHITECTURE_INTENT_VS_REALITY_2026_05_16.md for the gap.
DatasetConnector ¶
Bases: Protocol
A corpus the observatory can stream records from.
The existing ophamin.seeing.corpus.base.Corpus abstract class is the
canonical example. Each corpus is content-addressable (its content hash
appears in every signed proof record).
SubstrateProbe ¶
Bases: Protocol
A substrate-under-test the observatory can drive.
Any concrete substrate (Kimera, mock, future substrates) implements this
contract. The existing ophamin.seeing.substrate.base.SubstrateUnderTest
abstract class is the canonical example.
Registry¶
ophamin.registry ¶
Central plug-in registry.
Closes gap B from
docs/ARCHITECTURE_INTENT_VS_REALITY_2026_05_16.md — until this
module landed, the four Protocols declared in :mod:ophamin.protocols
(SubstrateProbe / DatasetConnector / Pillar /
ScenarioProtocol) had no registration surface. Plug-ins were
hard-imported into individual scenarios.
The registry exposes one dict per plug-in kind and one
register_* function per kind. Scenarios continue to register
themselves via :meth:Scenario.__init_subclass__ (Move A); the
SCENARIOS dict is re-exported here for one-stop discovery. Pillars
are registered by their module's __init__.py-time call to
:func:register_pillar. Corpora are looked up via
:func:ophamin.seeing.corpus.get_corpus (existing surface).
Every registration is loud-failure:
- A duplicate
pillar_nameraises :class:DuplicatePluginErrorrather than silently overwriting. - A plug-in that fails the matching
isinstance(p, Protocol)check raises :class:PluginProtocolViolationError— the Protocol declared the contract; an adapter that doesn't satisfy it is a real defect.
Outside callers query the registry via:
>>> from ophamin.registry import PILLARS, list_pillars, get_pillar
>>> p = get_pillar("O.spc")
>>> p.library, p.library_version
('numpy', '1.26.0')
Or via the ophamin pillar list / show CLI surface.
register_pillar ¶
Register one pillar adapter in the central registry.
Returns the pillar (so calls can be expressed as
MY_PILLAR = register_pillar(MyPillar()) at module scope).
Raises:
| Type | Description |
|---|---|
PluginProtocolViolationError
|
if the object doesn't satisfy
the :class: |
DuplicatePluginError
|
if another pillar already registered the
same |
Signed-record codecs¶
Empirical Proof Record¶
ophamin.measuring.proof.record ¶
The Ophamin Empirical Proof Record — the official result artifact.
One record per verified claim. Two serialisations:
proof.json canonical, machine-readable, JSON-Schema-validated
PROOF.md rendered, human-readable
A proof is bulletproof when it is:
- falsifiable — every claim carries a pre-registered Threshold
- pre-registered — claim + config + analysis plan hashed BEFORE the run
- traceable — content-addressed: claim -> config -> substrate -> data -> result
- reproducible — exact command + environment lock + lineage chain
- attributed — every statistic names the library + version that produced it
- tamper-evident — HMAC-signed over the whole record body
The nine sections:
1 Identity 2 Claim 3 Pre-registration
4 Data 5 Evidence 6 Verdict
7 Reproduction 8 Provenance 9 Signature
A REFUTED record is a valid proof — disproving a claim is a result.
EmpiricalProofRecord
dataclass
¶
The official Ophamin result artifact — nine sections, content-addressed, signed.
sign ¶
HMAC-SHA256 sign the record body. Returns self for chaining.
verify_signature ¶
True iff the signature matches the current body under key.
validate ¶
Return a list of problems; an empty list means the record is well-formed.
Enforces the properties that make a proof bulletproof — every one of them, not a subset.
from_dict
classmethod
¶
Reconstruct an EmpiricalProofRecord from its to_dict payload.
Mirrors the on-disk JSON shape produced by to_json — the body
sections (claim / preregistration / data / evidence / verdict /
reproduction / provenance / signature) plus the identity sub-dict.
Raises KeyError / ValueError loudly on a malformed payload —
no silent fill-defaults; a broken record should fail loud, not deserialize
as a partial.
from_json
classmethod
¶
Load a proof record from a JSON file written by to_json.
Claim
dataclass
¶
Section 2 — the falsifiable claim, as a five-tuple.
Threshold
dataclass
¶
A falsifiable pass/fail boundary — there is no claim without one.
Verdict
dataclass
¶
Section 6 — VALIDATED / REFUTED / INCONCLUSIVE against the threshold.
decide
classmethod
¶
decide(observed: float, threshold: Threshold, *, inconclusive: bool = False, reasoning: str = '') -> 'Verdict'
Decide the verdict by comparing observed against threshold.
PillarEvidence
dataclass
¶
Section 5 — one pillar's measured evidence, attributed to its library.
The cross_check field is constrained to
:data:_CROSS_CHECK_VALUES — passing prose into it fires a loud
ValueError at construction time. Long-form context belongs in
detail (free-form dict) instead.
PreRegistration
dataclass
¶
Section 3 — claim + plan hashed BEFORE the run.
preregistered_at must precede the record's created_at; validate
enforces it. Build this object before the experiment runs.
DatasetRef
dataclass
¶
Section 4 — one real dataset, content-addressed.
Reproduction
dataclass
¶
Section 7 — exact reproduction command, environment lock, lineage chain.
ophamin.measuring.proof.codec ¶
Format codec for :class:EmpiricalProofRecord — the single canonical
load / validate / verify / ingest surface.
The proof-record component dataclasses already declare to_dict /
from_dict round-trip pairs, and :class:EmpiricalProofRecord ships
to_json / from_json file shortcuts. This module collects them
into one loud-failure interface that bundles:
- JSON-Schema validation against
proof/schema.json; - the structural
record.validate()checklist (falsifiable + pre-registered + traceable + reproducible + attributed); - optional HMAC-SHA256 signature verification under a caller-provided key;
- one-call
ingestthat runs all three and raises loud on the first failure; - directory-walking
iter_proofs/list_proofsso the proof corpus on disk has a first-class Python interface.
Per the framework's no-fallback rule: every failure mode raises a
typed :class:ProofCodecError subclass with a descriptive message; the
codec does not return None or empty dicts on error and does not
swallow exceptions.
Read alongside :class:EmpiricalProofRecord itself
(src/ophamin/measuring/proof/record.py) and the JSON Schema
(src/ophamin/measuring/proof/schema.json).
dump ¶
Write record to path as canonical JSON. Returns the path.
Creates the parent directory if it doesn't already exist (mirrors
the convenience pattern of pathlib.Path.write_text callers
typically wrap).
Raises :class:OSError if the write fails — codec does NOT swallow
file-system errors. Use the higher-level CLI / orchestration layer
if structured error handling is wanted.
load ¶
Load + reconstruct an :class:EmpiricalProofRecord from path.
Raises :class:ProofDecodeError on file-system errors, malformed
JSON, or a structurally incomplete payload. The chained exception
preserves the underlying error for forensic debugging.
validate ¶
Run schema + record + (optional) signature validation in one call.
Returns a :class:ValidationReport capturing every layer's result.
Does NOT raise on any validation failure — caller inspects the
report's all_ok property and schema_errors /
record_problems tuples to decide what to do.
Use :func:ingest for the raise-on-any-failure variant.
The JSON-Schema check runs first; if the schema is broken, the
structural record.validate is skipped (a malformed-at-schema
payload can't be safely reconstructed into a record). When
schema-ok, the record is loaded and record.validate is
invoked; if a key was provided, signature verification runs
too. The signature_ok field is None when no key was
provided (i.e. the check was skipped), True / False when
a key was provided.
verify_signature ¶
Load the record at path + verify its HMAC-SHA256 signature.
Returns True iff the signature matches the record body under
key. False if signature is empty or doesn't match.
Raises :class:ProofDecodeError if the file itself can't be
loaded. Does NOT raise on signature mismatch — caller decides
whether to escalate (use :func:ingest with
strict_signature=True for the loud-failure variant).
ingest ¶
ingest(path: str | Path, *, key: bytes | None = None, strict_signature: bool = False, require_schema_version: str | None = SCHEMA_VERSION) -> EmpiricalProofRecord
Single-call load + full-validate + optional signature-verify.
The boundary function for accepting third-party proof records.
After a successful call, the returned :class:EmpiricalProofRecord
is guaranteed:
- structurally well-formed (JSON-Schema validated),
- record-validate-clean (no internal contradictions),
- schema-version matches
require_schema_version(unless that'sNone, which opts out of the version gate), - signature-verified IFF
strict_signature=Truewas passed AND akeywas provided.
On any failure, raises the matching :class:ProofCodecError
subclass with a descriptive message. The caller never has to
inspect a partial / fallback record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
file path to the proof JSON. |
required |
key
|
bytes | None
|
optional HMAC-SHA256 key for signature verification. |
None
|
strict_signature
|
bool
|
when True, require |
False
|
require_schema_version
|
str | None
|
required schema version. Default is
the current :data: |
SCHEMA_VERSION
|
Raises:
| Type | Description |
|---|---|
ProofSchemaError
|
JSON-Schema validation failed. |
ProofValidationError
|
structural |
ProofSchemaVersionMismatchError
|
|
ProofSignatureError
|
|
ProofDecodeError
|
file couldn't be read or JSON couldn't be
decoded (raised by underlying :func: |
list_proofs ¶
Walk directory recursively and return one summary entry per JSON.
A file that fails to decode produces an entry with error set
and the other content fields None — the walk does NOT stop on
a bad file. Use :func:ingest against an individual path when you
need loud-failure semantics for a single record.
Audit Record¶
ophamin.auditing.audit_record ¶
AuditRecord — the signed, content-addressable artefact of one audit run.
Parallel to EmpiricalProofRecord but descriptive by default — audits
don't require a falsifiable claim because the value is in the findings
distribution itself, not in passing a threshold. A separate threshold-mode
wrapper can pre-register pass/fail criteria for CI gating; that's a follow-on.
Nine logical sections, mirroring the proof record shape so the two can be processed by the same downstream tooling (reporting, drift, etc.):
- Identity ophamin version + commit, captured_at, schema version
- Target path being audited + its content hash (for forensics)
- Pillars which pillars ran, which were unavailable, versions
- Findings the union of every pillar's findings (already in PillarResult, but flattened here for cross-pillar hotspot detection)
- Summary aggregate counts + severity histogram + file hotspots
- (no verdict) audits are descriptive; if a claim is wanted, wrap this record in an Empirical Proof Record with a threshold on a chosen statistic
- Reproduction command + env-lock (later)
- Provenance (optional) PROV-O graph of the run
- Signature HMAC-SHA256 over the body
AuditRecord
dataclass
¶
One audit run's full artefact — signed, content-addressable.
As of schema audit/1.1 (Move L, 2026-05-16), an AuditRecord MAY
carry an optional :class:PreRegistration + chosen statistic
metric + :class:Verdict, turning the descriptive record into a
falsifiable artefact for CI gating. Records written under
schema audit/1.0 (no pre-registration fields) load cleanly under
the v1.1 codec — the optional fields default to None.
from_dict
classmethod
¶
Reconstruct an AuditRecord from its to_dict payload.
Accepts both schema audit/1.0 and audit/1.1 payloads. v1.0
records have no preregistration / verdict fields; v1.1
records may have one or both. Loud-fails on malformed shapes
rather than silent partial deserialisation.
attach_pre_registration ¶
attach_pre_registration(*, claim: Any, observed_value: float, metric: str = 'total_findings', analysis_plan: str = 'audit-side pre-registration: gate on a chosen audit statistic') -> 'AuditRecord'
Stamp an in-place pre-registration + verdict onto this record.
Per Move L's full universalization of the pre-registration
discipline: convert this descriptive audit into a falsifiable
artefact by attaching a Claim's threshold + a decided Verdict.
Returns self for chaining; bumps the record's schema_version to
audit/1.1 if it wasn't already there.
Sign() must be re-called after attach to refresh the signature (the body changed, so the old signature is invalid).
from_json
classmethod
¶
Load an AuditRecord from a JSON file written by :meth:to_json.
wrap_as_proof ¶
wrap_as_proof(*, claim: Any, observed_value: float, pillar_name: str = 'audit', statistic_name: str = 'total_findings', library: str = 'ophamin', library_version: str = '', analysis_plan: str = 'wrap-audit-as-proof: pre-register a threshold over an audit statistic for CI gating', sign_key: bytes | None = None) -> Any
Wrap this AuditRecord into a pre-registered EmpiricalProofRecord.
Per Move I: audits are descriptive by default, but a caller that
wants CI gating (e.g. total_findings <= 50) can wrap the
record in a proof record that carries the falsifiable claim +
threshold. The wrapping is lossless — the audit's forensic detail
(target path + content hash + per-pillar findings + summary) is
kept in the proof's reproduction + evidence sections.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
claim
|
Any
|
an :class: |
required |
observed_value
|
float
|
the statistic to evaluate the claim against
(typically |
required |
pillar_name
|
str
|
PillarEvidence pillar identifier in the wrapped
proof. Default |
'audit'
|
statistic_name
|
str
|
PillarEvidence statistic name. Default
|
'total_findings'
|
library
|
str
|
PillarEvidence library attribution. Default
|
'ophamin'
|
library_version
|
str
|
PillarEvidence library version. Default empty — caller fills if known. |
''
|
analysis_plan
|
str
|
PreRegistration analysis plan. Default explains the wrap-shape. |
'wrap-audit-as-proof: pre-register a threshold over an audit statistic for CI gating'
|
sign_key
|
bytes | None
|
HMAC-SHA256 sign key. If |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
A signed (or unsigned, if |
Any
|
class: |
to_markdown ¶
Render the audit as a human-readable Markdown report.
AuditSummary
dataclass
¶
Cross-pillar aggregate — every pillar's findings rolled up.
from_dict
classmethod
¶
Reconstruct an AuditSummary from its to_dict payload.
top_files round-trips as a list of [path, count] pairs (JSON
doesn't carry tuples natively); we coerce back into the (str, int)
tuple shape this dataclass declares.
ophamin.auditing.codec ¶
Format codec for :class:AuditRecord — Move H parallel to Move B's
proof codec.
Mirrors :mod:ophamin.measuring.proof.codec exactly so audit records
get the same load / validate / verify / ingest treatment proof records
do. The differences:
- No JSON-Schema validation today (audit records don't ship a
schema.jsonalongside; structural validation is via theAuditRecord.from_dictparser). - No threshold/verdict — audits are descriptive by default. The validation layer focuses on signature + body-roundtrip integrity + schema_version compatibility.
Per the framework's no-fallback rule, every failure raises a typed
:class:AuditCodecError subclass with a descriptive message.
dump ¶
Write record to path as canonical JSON. Returns the path.
Creates the parent directory if it doesn't already exist. Raises
:class:OSError if the write fails — codec does NOT swallow
file-system errors.
load ¶
Load + reconstruct an :class:AuditRecord from path.
Raises :class:AuditDecodeError on file-system errors, malformed
JSON, or a structurally incomplete payload. The chained exception
preserves the underlying error for forensic debugging.
validate ¶
Run structural + (optional) signature validation in one call.
Returns a :class:AuditValidationReport. Does NOT raise on any
validation failure — caller inspects the report's all_ok
property + record_problems tuple to decide what to do. Use
:func:ingest for the raise-on-any-failure variant.
The structural check (via :meth:AuditRecord.from_dict + the
in-module _structural_problems helper) runs first; if the
record can't be loaded, the report carries the decode error as
a single record-problem string. When loaded, the in-module shape
check augments with cross-section consistency (e.g. pillars in
record vs summary).
verify_signature ¶
Load the record at path + verify its HMAC-SHA256 signature.
Returns True iff the signature matches the record body under
key. False if signature is empty or doesn't match.
Raises :class:AuditDecodeError if the file itself can't be
loaded.
list_audits ¶
Walk directory recursively; emit one summary entry per JSON.
A file that fails to decode produces an entry with error set
and the other content fields None. Mirrors
:func:ophamin.measuring.proof.codec.list_proofs.
Campaign Record¶
ophamin.campaign ¶
CampaignRecord + the 6-phase composite-run orchestrator (Move F).
Closes Deficit 2 from
docs/ARCHITECTURE_EXTENDED_AUDIT_2026_05_16.md — the "6 phases"
the owner named are the six wheels of Ophamin's architecture
operating as a single coordinated pass against a substrate:
seeing— discover the substrate's surfacemeasuring— run the requested scenarios; collect signed proof recordscomparing— synthesize the measuring output into a campaign summary; detect verdict flipsinstrumenting— collect per-cycle resource cost (when the substrate was wrapped in InstrumentedSubstrate)auditing— static-analysis sweep over the substrate's source (when a source-code path is available)reporting— collate every preceding phase's output into one rolled-up Markdown report
Each phase produces a :class:CampaignPhase aggregate; the
:class:CampaignRecord collects them into a signed,
content-addressed aggregate. A phase can be ok / skipped /
failed; skipped is the sanctioned outcome when the substrate
doesn't expose what a phase needs (e.g. auditing against a
MockSubstrate is skipped because there's no source code to audit).
The orchestrator never silently swallows phase failures — a failed phase carries its error message into the record so the operator sees exactly what broke. This is the framework's "loud-failure" stance applied at the campaign level.
CANONICAL_PHASE_ORDER
module-attribute
¶
CANONICAL_PHASE_ORDER: tuple[str, ...] = ('seeing', 'measuring', 'comparing', 'instrumenting', 'auditing', 'reporting')
CampaignPhase
dataclass
¶
One wheel's contribution to a composite run.
Three possible terminal status values:
"ok"— phase completed;artifact_paths+summarycarry the output."skipped"— phase didn't apply (e.g. auditing against a Mock substrate);errorcarries the reason."failed"— phase raised;errorcarries the exception string. The campaign continues to the next phase (loud-failure at the campaign level, not at the per-phase level — operator sees every phase's outcome).
CampaignRecord
dataclass
¶
Signed, content-addressed aggregate of one full-pass run.
Schema 2.0 (current) adds two strictly-additive fields:
corrected_verdicts—{claim_id → corrected_verdict}after multiplicity correction (FWER or FDR). Empty dict when no correction was applied or when no records carried a p_value.multiplicity_correction_method—"holm"/"bh"/"none". The method the writer used when populatingcorrected_verdicts.
Schema 1.0 records remain readable: missing additive fields default
to empty dict / "none" respectively. Signature verification is
version-aware — :meth:_body includes the additive fields only
when schema_version != "1.0", so a 1.0 signature still
re-canonicalises bit-equal to the original wire form.
run_campaign ¶
run_campaign(*, substrate: SubstrateUnderTest, target_name: str | None = None, target_git_commit: str | None = None, scenarios: list[type[Scenario]] | None = None, enable_phases: set[str] | None = None, out_dir: str | Path = 'campaigns/latest', sign_key: bytes = DEFAULT_SIGN_KEY, fwer_method: str = 'holm', fwer_alpha: float = 0.05) -> CampaignRecord
Run the six wheels in canonical order; emit a signed CampaignRecord.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
substrate
|
SubstrateUnderTest
|
the substrate the measuring phase will run scenarios against. Required. |
required |
target_name
|
str | None
|
human-facing name for the target (default: the
substrate's |
None
|
target_git_commit
|
str | None
|
the target's git commit hash (default: the
substrate's |
None
|
scenarios
|
list[type[Scenario]] | None
|
list of Scenario classes to run in the measuring
phase. Default: every default-instantiable scenario in
:data: |
None
|
enable_phases
|
set[str] | None
|
set of phase names to run. Default: all six. |
None
|
out_dir
|
str | Path
|
directory under which per-phase artifacts are written. |
'campaigns/latest'
|
sign_key
|
bytes
|
HMAC-SHA256 key for signing the final record. |
DEFAULT_SIGN_KEY
|
fwer_method
|
str
|
multiplicity-correction method to apply during the
comparing phase. One of :data: |
'holm'
|
fwer_alpha
|
float
|
family-wise / FDR threshold used when applying the correction. Default 0.05. |
0.05
|
Returns:
| Type | Description |
|---|---|
CampaignRecord
|
A signed :class: |
CampaignRecord
|
class: |
CampaignRecord
|
|
CampaignRecord
|
|
CampaignRecord
|
populated from the FWER pass. |
The orchestrator NEVER raises on a per-phase failure — it captures
the error string into the phase's error field and continues
to the next phase. The caller inspects record.any_failed to
surface to a non-zero exit code if appropriate.
dump_campaign ¶
Write a CampaignRecord to disk as canonical JSON. Returns the path.
Regression Alert Record¶
ophamin.comparing.regression_alert ¶
Regression-alert daemon (Move J) — closes gap F from the prior audit.
Detects verdict regressions across two snapshots of a proof corpus
(typically: proofs/ at the prior Kimera commit vs proofs/ at the
new commit). A "regression" is a scenario whose verdict moved from
VALIDATED (or INCONCLUSIVE) to REFUTED — the substrate started failing
a claim it previously satisfied. The detector also flags the inverse
("recovery": REFUTED → VALIDATED) and the lateral cases (different
verdict that's not the substrate's load-bearing regression direction).
The pipeline:
- Snapshot a proof corpus at commit A (e.g. via
:func:
scan_proof_directory). - Snapshot the proof corpus at commit B.
- Run :func:
compute_regression_alerton the pair. - Inspect the resulting :class:
RegressionAlert— list of :class:VerdictTransitionrows + headline counts.
The pairing key is the scenario's stable identifier (the proof's
filename family or — when present — the underlying scenario name via
the proof's claim-statement signature). For two proofs of the same
family at two different substrate commits to be paired, both must
carry the same family heuristic; mis-paired entries are surfaced as
unmatched_in_a / unmatched_in_b for operator inspection.
CLI:
ophamin watch-proofs --before
Output is a signed :class:RegressionAlertRecord (HMAC-SHA256 +
content-addressed alert_id), mirroring the shape of every other
Ophamin artifact. A REGRESSION-class alert is exit-code 1; a quiet
(no-change) alert is exit-code 0; a recovery-class alert is exit-code
0 with a notable summary line.
Substrate base¶
ophamin.seeing.substrate.base ¶
The substrate-under-test abstraction.
Ophamin is independent of any particular system. Whatever it tests is a
SubstrateUnderTest (SUT): something that can be reset, run for one cycle on
a stimulus, and asked for its git commit and state. MockSubstrate implements
this with no dependencies (so the framework is fully runnable on its own);
KimeraAdapter implements it over a subprocess boundary to Kimera-SWM.
The cycle boundary is deliberate. Per the leak-free probe shape established
empirically (a fresh interpreter per cycle removes process-level state carry),
run_cycle is the unit of measurement and reset is honoured between runs.
SubstrateUnderTest ¶
Bases: ABC
Abstract system under test. Implement this to plug a system into Ophamin.
git_commit
abstractmethod
¶
Return the substrate's source revision.
This is the data_git_commit_id end of the provenance bridge: every
recorded run is tethered to the exact substrate revision that produced it.
Return "" only if the substrate genuinely has no version anchor.
run_cycle
abstractmethod
¶
Exercise the substrate for exactly one cycle on stimulus.
params carries the swept configuration for this run. Implementations
must not silently degrade: if the cycle cannot run, return a
CycleResult with success=False and a populated error, or
raise — never fabricate a plausible-looking result.
run_batch ¶
Exercise the substrate over a batch of stimuli.
The default implementation simply loops run_cycle — correct, but one
boundary crossing per cycle. Adapters that can run a whole batch inside a
single process (the leak-tolerant density path) should override this;
cycle_index is renumbered sequentially across the batch.
capture_state ¶
Return a serialisable snapshot of substrate state (for provenance).
Default is empty; substrates with inspectable state should override.
CycleResult
dataclass
¶
The outcome of one substrate cycle.
raw is whatever the substrate emitted, untouched. success and
halt_mode are the two cross-substrate fields every adapter must fill.
A pre-built metric_bundle may be attached by the adapter; otherwise
to_metric_bundle does best-effort extraction from raw.
to_metric_bundle ¶
Return the attached bundle, or build one with best-effort extraction.
The default extraction recognises a small set of conventional field
names. Adapters that know their substrate should attach an explicit
metric_bundle rather than relying on this.
ophamin.seeing.substrate.mock ¶
MockSubstrate — a self-contained substrate under test.
This is what makes Ophamin runnable and testable with no external system. It is
a deterministic, seedable stand-in that produces plausible cycle results: a
phi-like cognitive signal that drifts as state accumulates, an energy gauge
that depletes, latency timers, and a tunable collapse mode so the diagnostics
have something to find.
It is not a model of any real substrate — it exists so the framework's pillars
and orchestration can be exercised and verified end-to-end. Real systems plug in
through their own SubstrateUnderTest adapter (see kimera_adapter).
MockSubstrate ¶
Bases: SubstrateUnderTest
A deterministic, seedable substrate stand-in.
Behaviour responds to swept parameters so the framework can be exercised:
injection_rate raises ``phi``, lowers cross-modal overlap & energy
immune_threshold low + high injection -> occasional overwhelm collapse
variant "treatment"-like labels add a small positive effect
entropy_coefficient in a ``collapse_cell``, low entropy triggers collapse
cell topological cell id (for the kernel-coupling probe)
State (cycle count, energy, accumulated phi) carries across cycles and is
cleared by reset — the seed makes the whole sequence reproducible.
ophamin.seeing.substrate.kimera_adapter ¶
KimeraAdapter — plug Kimera-SWM into Ophamin as a multi-component substrate.
This is the central Kimera-coupling point in the framework — it adapts the
substrate-under-test surface so the rest of Ophamin
(measuring/ / comparing/ / auditing/ / reporting/) operates
against the abstract SubstrateUnderTest protocol. A small number of
seeing-wheel-internal helpers (seeing/discovery, seeing/wiring,
seeing/telemetry) also reach into Kimera shapes — those are the same
conceptual layer as KimeraAdapter itself. Models Kimera-SWM as what
it is: a multi-component entity, not a single cognitive cycle.
An experiment targets either the whole entity (target="entity" — the
integrated Takwin cycle) or a named component ("walker", "gwf",
"rosetta", "arachne", "ouroboros", "pentecost", "piovra",
"astrolabe" …) — each invoked through its own verified entry point.
Two modes:
mode="subprocess"— a fresh interpreter per cycle. Leak-free, slow; the precision path.mode="batch"— one interpreter, the component constructed once, the whole batch looped in-process. Fast; the density path. State accumulates across the batch, which for most components is the substrate working as designed (memory-as-deformation), not a leak.
Performance is measured, never assumed: measure_throughput runs a
bounded batch and reports real cycles/sec on this vessel. probe verifies
which targets are actually reachable in the connected repo.
KimeraAdapter ¶
Bases: SubstrateUnderTest
Subprocess adapter for the Kimera-SWM substrate — entity or any component.
write_runner_template
staticmethod
¶
Dump the bundled runner so it can be edited and reused.
probe ¶
Verify which targets are reachable in the connected repo.
Returns a structured report — run this before wiring scenarios. It is how the adapter checks the substrate, not the docs.
measure_throughput ¶
Measure real cycles/sec for this target — performance is measured, not assumed.
Runs the stimuli as one in-process batch and times it. The result is the empirical basis for choosing subprocess-vs-batch and for a throughput proof record — there is no assumed performance figure anywhere.
Corpus base¶
ophamin.seeing.corpus.base ¶
Massive-dataset corpus layer — base abstraction.
A Corpus locates a downloaded open-source dataset on disk, content-addresses
it, counts its records, and streams them as CorpusRecord objects — so a
catastrophic-testing scenario can feed real data through the substrate in
concentrated batches.
The four connectors (connectors.py):
EnronCorpus ~500k real executive emails — organisational dissonance
LinuxKernelCorpus ~1.4M commit messages — logic / topology siege
CyberPayloadCorpus Metasploit modules + injection sets — concentrated immune siege
FloresCorpus FLORES-200, 200 parallel languages — Rosetta scaling limit
Content hashes and record counts are computed once and cached to disk
(.ophamin_<name>_content_hash / _count) so a 1.7 GB archive is not
re-hashed on every run.
Corpus ¶
Bases: ABC
A downloaded open-source dataset, content-addressed and streamable.
records
abstractmethod
¶
Stream every record. Must be a generator — corpora do not fit in memory.
content_hash ¶
Content-addressed hash of the corpus, cached in-memory and on disk.
sample ¶
A deterministic reservoir sample of n records (single streaming pass).
chunks ¶
Yield records in batches of size — concentrated-batch density feeding.
limit caps the total number of records emitted across all batches.
CorpusRecord
dataclass
¶
One item from a corpus — an email, a commit message, a payload, a sentence.
Scenario base¶
ophamin.measuring.scenarios.base ¶
The catastrophic-scenario layer.
A Scenario binds a real corpus + a substrate target + a pre-registered
falsifiable claim. It streams the corpus through the substrate, scores the run,
and emits a signed EmpiricalProofRecord.
The harness is substrate-agnostic — it runs identically against MockSubstrate
(tests) or KimeraAdapter (real catastrophic runs). Pre-registration is
captured before the run; the proof record is content-addressed and signed.
Scenario registration¶
Every concrete subclass of :class:Scenario that sets a name attribute
distinct from the base sentinel "scenario" is automatically
registered in the module-level :data:SCENARIOS mapping via the
:meth:Scenario.__init_subclass__ hook. There is no manual editing of an
__init__.py dict required; the registry is built by class-definition
side effect.
Registration is loud-failure:
- A duplicate
nameacross two subclasses raises :class:DuplicateScenarioNameErrorat class-definition time. - A subclass that sets
name = "scenario"(the unchanged base default) raises :class:ScenarioNameNotOverriddenError. - A subclass that opts out via
register=False(e.g. an abstract intermediate parent in a class hierarchy) is skipped silently. This is the only sanctioned skip path.
Third-party / out-of-tree scenarios reach the same registry by simply
inheriting from :class:Scenario in their own package; importing their
module fires the registration hook.
Scenario ¶
Bases: ABC
Binds corpus + target + pre-registered claim -> a signed proof record.
Every concrete subclass declares a metadata block (name, tier,
family, goal, explanation, and optionally method +
falsification_consequence) that classifies the experiment and
explains its intent without requiring the reader to chase docstrings.
The metadata is validated at class-definition time by
:meth:__init_subclass__ (loud-failure on omission) and surfaces into
every signed EmpiricalProofRecord produced by the scenario.
__init_subclass__ ¶
Auto-register concrete subclasses in :data:SCENARIOS.
Skips registration when register=False (abstract intermediate
parents, test-internal scenarios). Otherwise:
- raises :class:
ScenarioNameNotOverriddenErrorif the subclass kept the base sentinel name; - raises :class:
ScenarioMetadataMissingErrorif any oftier/family/goal/explanationis unset or empty; - raises :class:
DuplicateScenarioNameErrorif another subclass already registered the same name.
Re-registration of the same class object under the same name is
idempotent — this is necessary so module reloads (e.g. test
fixtures, importlib.reload) don't trip the duplicate guard.
build_claim
abstractmethod
¶
The pre-registered falsifiable claim this scenario tests.
score
abstractmethod
¶
Read the completed run into an observed value + pillar evidence.
field_contract ¶
The OrchestratorResult fields this scenario depends on.
Default None means no contract — scenarios that don't override
this run exactly as before (back-compat). Scenarios that DO override
get loud-failure on the first cycle if a required field is missing
or has the wrong type. This catches Kimera-side renames at experiment
setup time instead of silently breaking downstream.
Returning a contract is purely additive — the scenario still reads
cycle.raw["..."] ad-hoc in :meth:score. The contract is the
gate, not the projection.
select_records ¶
Which corpus records to use — default is the corpus stream; override to filter.
run ¶
run(substrate: SubstrateUnderTest, *, data_root: str | 'Path' | None = None, sign_key: bytes = DEFAULT_SIGN_KEY) -> EmpiricalProofRecord
Run the scenario end-to-end and return a signed Empirical Proof Record.
ScenarioScore
dataclass
¶
A scenario's read of a completed run — the observed value + the evidence.
Tier ¶
Bases: str, Enum
The experimentation tier a scenario lives in.
Tiers carry epistemic shape, not just bookkeeping:
SCIENTIFIC— claims about substrate behaviour (does the substrate do X under condition Y?).ENGINEERING— claims about substrate cost (does X stay under threshold T?).PHILOSOPHICAL— claims about substrate self-model (does the substrate respond differently to self-referential vs neutral input?).EMPIRICAL_DEEP— substrate-physics characterisation scenarios that target Kimera's prime apparatus / Φ / cross-channel behaviour and mirror Family A-V claims in Kimera'sEMPIRICAL_VALIDATION.md.MEASUREMENT_MACHINERY— validation of the upstream libraries Ophamin itself depends on (e.g. CRDT laws against pycrdt + y-py as cross-check oracle).
Inheriting from str makes a Tier serialise as its value
string in JSON; the JSON proof-record schema sees a plain string,
not a Python-specific enum encoding.
Audit pillar base¶
ophamin.auditing.base ¶
The audit-pillar contract — Finding, FindingSeverity, PillarResult, AuditPillar.
A pillar wraps one external static-analysis tool. The contract is small:
nameandtool_nameidentify the pillar (e.g. "ruff", "bandit")is_available()reports whether the underlying binary is installedrun(target_path)returns aPillarResultcarrying the findings + raw output
Findings are normalised across tools — every tool's output is parsed into the
same Finding dataclass — so downstream code (aggregation, reporting,
threshold-mode claims) doesn't need to know which pillar produced what.
AuditPillar ¶
Bases: ABC
Wraps one external static-analysis tool as an audit pillar.
A subclass implements tool_binary (the CLI name to look up on PATH),
tool_version (a way to ask the tool its version), and run (the
actual invocation + parse). Tool absence is reported as
status="unavailable" — never silently skipped.
resolved_binary
classmethod
¶
Resolve the tool binary — venv-local first, then PATH.
is_available
classmethod
¶
Is the wrapped tool resolvable (venv-local OR on PATH)?
tool_version ¶
Best-effort <tool> --version capture; empty string on failure.
unavailable_result ¶
Standard unavailable result for when the tool isn't installed.
run
abstractmethod
¶
Run the tool against target_path and return a PillarResult.
Pillars MUST handle missing-tool cleanly via unavailable_result
and runtime failures via error_result. Never silently swallow
a failure.
Finding
dataclass
¶
One static-analysis finding, normalised across tools.
Every field except path and message may be empty if the producing
tool doesn't carry it — but the dataclass shape is stable so downstream
code can rely on it.
FindingSeverity ¶
Bases: str, Enum
Normalised severity across heterogeneous tools.
Each pillar maps its tool's native severity scale onto these five buckets; the mapping is documented per-pillar.