Authoring a new Ophamin scenario¶
This guide is for the case where a new Kimera discovery — a field that emerged, a behavior that shifted, a property worth testing — has surfaced and you want to land it as a pre-registered, signed Empirical Proof Record.
Ophamin's scenarios are deliberately code, not config, because each one has
its own scoring shape. The shared infrastructure (ophamin.scenario.helpers)
makes the per-scenario file a thin layer over reusable building blocks.
The shape of a scenario¶
A scenario is a Scenario subclass that:
- names a corpus (one of the registered names in
ophamin.corpus) - names a Kimera target (one of the registered targets in
ophamin.substrate.KimeraAdapter) - builds a pre-registered claim with a falsifiable threshold
- selects records from the corpus (optional filter / re-sampling)
- scores the resulting
CycleResultlist against the threshold
The harness in Scenario.run does the rest: pre-registration capture,
substrate batch run, verdict, signed proof record, provenance graph.
Boilerplate elimination¶
Before this layer existed, each scenario re-implemented:
- shape-aware extraction of
gwf_verdict,dissonance_events,canonical,prime,halt_mode,phi_value; - Wilson 95% CI computation;
- inconclusive guards (too few cycles / majority adapter errors);
- descriptive distribution summaries (median, mean, p10/p90).
All four are now in ophamin.scenario.helpers. Use them rather than rolling
your own — when Kimera shifts a field name, those helpers get updated in one
place rather than four.
The minimal scenario¶
from ophamin.measuring.proof import Claim, PillarEvidence, Threshold
from ophamin.measuring.scenarios import helpers
from ophamin.measuring.scenarios.base import Scenario, ScenarioScore, Tier
class MyNewScenario(Scenario):
name = "my-new-scenario"
# --- required metadata (Move A, 2026-05-16) ---
tier = Tier.SCIENTIFIC # SCIENTIFIC / ENGINEERING / PHILOSOPHICAL
# / EMPIRICAL_DEEP / MEASUREMENT_MACHINERY
family = "<family-tag>" # e.g. "immune", "prime", "phi"; scenarios in the
# same family probe the same substrate aspect from
# different angles
goal = (
"<one-sentence statement of what question this scenario answers>"
)
explanation = (
"<paragraph: why this scenario is interesting, what substrate "
"property a verdict would tell us about, what is at stake>"
)
# --- optional metadata ---
method = "<scoring-shape-tag>" # e.g. "wilson_ci_proportion",
# "jaccard_floor", "cohens_d_paired"
falsification_consequence = (
"<one-line of what a REFUTED verdict would mean concretely>"
)
# --- end metadata ---
corpus_name = "<corpus-name>" # enron, linux, flores, cyber, financial, the-well
target = "<target-name>" # entity, pentecost, ouroboros, rosetta, arachne,
# walker, gwf, piovra, astrolabe, atlas, spde
def __init__(self, n_cycles: int = 1000, threshold: float = 0.50) -> None:
self.n_cycles = int(n_cycles)
self.threshold = float(threshold)
def build_claim(self) -> Claim:
return Claim(
statement=(
f"On <corpus> records that <filter>, Kimera's <subsystem> "
f"<property> in >= {self.threshold:.0%} of cycles."
),
operationalization="fraction of cycles for which <predicate>",
threshold=Threshold(
"<metric_name>", ">=", self.threshold, "fraction"
),
h0=f"P(<event> | <filter>) < {self.threshold}",
h1=f"P(<event> | <filter>) >= {self.threshold}",
)
def score(self, cycle_results, records):
cleared_total = 0
cleared_event = 0
adapter_errors = sum(1 for r in cycle_results if helpers.is_adapter_error(r))
for result in cycle_results:
if helpers.is_adapter_error(result):
continue
if helpers.gwf_cleared(result): # adjust the filter
cleared_total += 1
if <YOUR-EVENT-CONDITION>: # e.g. helpers.dissonance_count(result) >= 1
cleared_event += 1
rate = cleared_event / cleared_total if cleared_total else 0.0
lo, hi = helpers.wilson_95_ci(cleared_event, cleared_total)
inconclusive, reason = helpers.is_inconclusive(
n_cycles=len(cycle_results),
adapter_errors=adapter_errors,
n_denominator=cleared_total,
)
evidence = [
PillarEvidence(
pillar="<O.subsystem.metric>",
statistic_name="<metric_name>",
statistic_value=rate,
library="statsmodels",
library_version="0.14",
ci_low=lo,
ci_high=hi,
cross_check="n/a",
detail={
"cleared_event": cleared_event,
"cleared_total": cleared_total,
"ci_method": "wilson_95",
},
),
]
return ScenarioScore(
observed_value=rate,
evidence=evidence,
inconclusive=inconclusive,
reasoning=(
f"event fired in {cleared_event}/{cleared_total} "
f"cleared cycles ({rate:.1%}); "
f"{len(cycle_results)} cycles, {adapter_errors} adapter errors"
+ (f"; {reason}" if reason else "")
),
)
A simple-proportion scenario in this shape lands in ~80 lines. Empirical-deep scenarios (Bayesian / causal / cross-channel-MI / prime-structure) are typically 300–500 LOC because they orchestrate multiple library backends.
Register the class in src/ophamin/measuring/scenarios/__init__.py's
SCENARIOS dict so it's reachable from the ophamin scenario <name>
CLI surface. Write a runner in examples/; add tests in
tests/test_scenario_*.py using the _SyntheticSubstrate pattern (one
test file per scenario is the current convention).
When you need more than the simple proportion shape¶
The 19 shipped scenarios cover several distinct scoring shapes:
| Shape | Example | What's different |
|---|---|---|
| Simple proportion (filter + event) | Organizational Dissonance, Logic-Topology Siege | The minimal scenario above is sufficient. |
| Labelled paired sample | Immune Siege | Computes false-positive AND detection rates on benign vs malicious labels; needs select_records for balanced interleave. |
| Group-by + all-K-agree | Rosetta Scaling | Each "group" is a set of K stimuli; score is the fraction of groups where all K landed on the same canonical. |
| Distribution-floor (Jaccard / divisibility / coverage) | Memory-As-Deformation, Prime Structure, Substrate Completeness | Score is the minimum or aggregate over many measurements (e.g. concept-Jaccard floor across re-exposure pairs). |
| Bayesian posterior contraction | Bayesian Φ Posterior | Score is the HDI contraction ratio at N vs theoretical-frequentist bound; depends on arviz + pymc. |
| Causal-graph recovery | Causal Discovery | Score is the count of significant directed links recovered by PCMCI; depends on tigramite. |
| Cross-channel mutual information | Cross-Channel MI | Score is a count of pairs above a MI floor; depends on pyitlib + ennemi (cross-check oracle). |
| Cross-instance determinism | Prime Cross-Instance | Score is an invariance fraction across N fresh Takwin processes. |
For the more elaborate shapes, look at the existing scenario files for
the pattern. The helpers library still applies — it just composes with
custom group / pair logic. Bayesian / causal / MI scenarios also use the
matching bayesian_helpers.py / causal_helpers.py / analytic_helpers.py
modules in src/ophamin/measuring/.
The pre-registration discipline (load-bearing)¶
- The claim is registered before the substrate runs.
- The threshold is falsifiable — a value Kimera could fail to meet.
- The proof record is signed with
DEFAULT_SIGN_KEY(HMAC-SHA256). - The provenance graph captures both the substrate's and Ophamin's git commits.
- A REFUTED verdict is the framework working, not failing. Don't soften thresholds post-hoc to land VALIDATED.
If your discovery doesn't fit a pre-registrable claim shape — e.g. you want
to characterise a distribution rather than test a threshold — that's
descriptive evidence, not a claim. Capture it as PillarEvidence with
cross_check="n/a" and no surrounding Threshold. It still lands in the
proof record but is not part of the primary verdict.
Layer A discovery: don't author blindly¶
Before writing a new scenario against a field you read about in Kimera's
docs, run ophamin discover <kimera-repo> to confirm the field actually
exists in CycleResult.raw on the current Kimera commit. The schema doc
saves you from writing a 300-line scenario against a renamed or removed
field.
Layer C drift (when it ships)¶
Once Layer C is online, every scenario's secondary descriptive evidence becomes a tracked trend across Kimera commits. Two practical implications:
- Add a distribution PillarEvidence for the underlying signal even when your primary claim is a single proportion. (This is the convention across the shipped scenarios.) Layer C reads the distribution stats, not just the proportion.
- Use stable statistic names across versions of the same scenario. If a scenario renames its statistic, drift detection treats it as a new metric (false-negative on real drift).