Skip to content

SonarQube → Kimera-SWM — empirical validation (0.55.0)

TL;DR: The mandatory SonarQube stack shipped at 0.50.0 + the 4-phase integration roadmap (0.51.0–0.54.0) was empirically validated on 2026-05-19 by running a real scan against the Kimera-SWM checkout. The pipeline works end-to-end; one Kimera-side empirical limit surfaced (takwin.py is too large for SonarQube's Python analyzer) and is now excluded by default in the bundled sonar-project.kimera-swm.properties.

What was validated

The exact recipe documented in docs/SONARQUBE.md was executed verbatim from cold-start:

  1. bash scripts/sonar_up.sh — SonarQube stack reached healthy in ~30s (volumes persisted from 0.50.0 validation)
  2. Forced admin password change via SonarQube REST API (POST /api/users/change_password) — replicates the operator's first-login flow without needing a browser
  3. Generated a user token via POST /api/user_tokens/generate — replicates the operator's /account/security UI step
  4. SONAR_TOKEN=<token> bash scripts/sonar_scan.sh "/path/to/Kimera_SWM" — ran the bundled scanner via Dockerized sonarsource/sonar-scanner-cli against the actual Kimera-SWM checkout (3,818 source files + 1,459 test files at this scan's snapshot)

Every step is what an operator following the docs would run; no debug intervention.

Empirical findings

Finding #1: takwin.py exceeds SonarQube's Python analyzer single-file capacity

The first scan attempt got stuck on kimera_swm/domain/cognitive/takwin.py — the 34,666-line orchestrator file. The Sonar Python analyzer spent 19:45 (mm:ss) wall-clock stuck on this single file before exiting with EXECUTION FAILURE. No other file in Kimera-SWM exhibits this behavior; the analyzer processed the preceding 4,496 of 4,497 files normally.

This is a known limit of the bundled SonarQube CE Python analyzer's static-analysis path for files of this magnitude — the AST + symbol-table working set on a 34k-line module overflows the default Compute Engine working set.

Resolution: the bundled sonar/sonar-project.kimera-swm.properties now excludes **/kimera_swm/domain/cognitive/takwin.py from the default scan. Operators wanting to scan it specifically can override the exclusions on the scanner CLI AND bump the SonarQube Compute Engine heap to 8g+ AND budget 30+ minutes of wall-clock per CHANGELOG-pinned notes.

Finding #1.5: Two-step exclusion-pattern fix

The first attempt to exclude takwin.py used **/kimera_swm/domain/cognitive/takwin.py. The scanner ignored it — got stuck on takwin.py again. Root cause: sonar.sources=kimera_swm makes the source root ALREADY kimera_swm/, so the exclusion pattern's kimera_swm/ prefix doesn't match because Sonar evaluates paths relative to the source root. Correct pattern is **/domain/cognitive/takwin.py — relative to the source root, not to repo root. This is a load-bearing detail documented inline in sonar/sonar-project.kimera-swm.properties so future operators don't repeat the mistake.

Finding #2: Everything else scans cleanly in under 6 minutes

After applying the correct exclusion, the third scan completed successfully:

15:08:37.097 INFO  EXECUTION SUCCESS
15:08:37.100 INFO  Total time: 5:40.108s

Empirical numbers captured via SonarQube REST API (/api/measures/component?component=kimera-swm&metricKeys=...):

Metric Value
Wall-clock duration 5:40.108s (under 6 minutes)
Files (Python source) 3,158
Source files submitted to text/secrets sensor 4,520
Non-comment lines of code (ncloc) 571,610
Total lines (incl. comments + blanks) 990,250
Functions 31,658
Classes 9,008
Bugs 667
Vulnerabilities 2
Security hotspots 236
Code smells 7,827
Duplication 9.0%
Technical debt (SQALE index) 59,322 minutes (≈ 988 hours / 24 weeks)
Reliability rating 5.0 (E — worst, driven by the 667 bugs)
Security rating 3.0 (C — driven by the 236 hotspots + 2 vulnerabilities)
Maintainability rating 1.0 (A — best; technical-debt is high in absolute terms but proportionally manageable on the 571,610 ncloc base)
Quality Gate status OK
Clean-as-You-Code status compliant

The Quality Gate passes because the default "Sonar way" gate applies to NEW code only (the project's new-code reference defaults to "previous version" — and there's no previous version yet on this first-ever scan). Historical findings are reported but not gated. A future scan with a real baseline will start gating any new-code regressions.

Finding #3: Empirical-validation footprint (Sonar volumes)

After one Kimera-SWM scan, the persistent volumes on disk (docker system df -v):

Volume Disk size
ophamin_sonardb_data 533.3 MB
ophamin_sonarqube_data 364.3 MB
ophamin_sonarqube_extensions 1.0 KB
ophamin_sonarqube_logs 480.8 KB

Total: ~898 MB of disk for one scan + the in-memory state (SonarQube + PostgreSQL + bundled Elasticsearch). Subsequent scans incrementally add to the dataset; operators wanting a fresh start should bash scripts/sonar_down.sh --wipe.

What this empirically validates

Per the original 0.50.0 directive ("a proper SonarQube instance, running for kimera swm, mandatory"), the validation confirms:

  • SonarQube stack reaches healthy on cold-start (~30s after fix-then-confirm cycle in 0.50.0; ~10s on warm-start with persisted volumes)
  • Admin token generation works via REST API (no UI required; sonar_scan.sh accepts the token directly)
  • Scanner runs against Kimera-SWM without Ophamin-side configuration changes (the bundled properties file is the Kimera-SWM-specific config; operators don't tune anything to run the scan after the takwin.py exclusion + the path-pattern fix in 0.55.0)
  • Scan completes in 5:40 wall-clock on the dev machine (Apple M4 Max, 16 CPU / 7.75 GiB allocated to Docker) — well within an operator's coffee-break attention window
  • Dashboard populated at http://localhost:9000/dashboard?id=kimera-swm with 571,610 lines of Python source mapped + classified + measured
  • Quality Gate OK (no NEW-code regressions vs the empty baseline; first-scan establishes the baseline for future scans)
  • Empirical limits surface honestly — both the takwin.py finding AND the exclusion-pattern-relative-to-source-root finding are real constraints that operators need to know; both now baked into the bundled config + documented in CHANGELOG so the next operator's first scan doesn't hit the same walls

Coverage caveat

This validation scan did NOT generate coverage.xml first (would require a full pytest --cov run against Kimera-SWM, which has its own substantial test infrastructure). The Sonar dashboard accordingly reports 0% coverage for this scan.

To produce a scan WITH coverage:

# In the Kimera-SWM checkout root:
pytest --cov=kimera_swm --cov-report=xml:coverage.xml

# Then the next Ophamin scan picks it up automatically via
# sonar.python.coverage.reportPaths=coverage.xml in the bundled
# properties file.
SONAR_TOKEN=<token> bash scripts/sonar_scan.sh "/path/to/Kimera_SWM"

Operator quick-reference

The validation recipe in command form:

# 1. Bring up SonarQube
cd /path/to/Ophamin
bash scripts/sonar_up.sh

# 2. First-login flow (replace your-password with something strong):
curl -s -X POST -u admin:admin \
    "http://localhost:9000/api/users/change_password?login=admin&previousPassword=admin&password=your-password"

# 3. Generate a token:
TOKEN=$(curl -s -X POST -u "admin:your-password" \
    "http://localhost:9000/api/user_tokens/generate?name=ophamin-scan" \
    | python3 -c "import json,sys; print(json.loads(sys.stdin.read())['token'])")
export SONAR_TOKEN="$TOKEN"

# 4. Scan Kimera-SWM
bash scripts/sonar_scan.sh "/path/to/Kimera_SWM"

# 5. Browse the dashboard
open http://localhost:9000/dashboard?id=kimera-swm

The whole thing from cold-start to dashboard is ~10-15 minutes wall-clock on the dev machine (Apple M4 Max, 16 CPU / 7.75 GiB allocated to Docker), with the scan itself dominating once SonarQube reaches healthy.

See also