feat: kOREInference — Algorithm 4 iDRegEx with MDL scoring + ensemble integration #1
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "feature/kore-inference"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
Implements
kOREInferencefollowing arXiv 1004.2372 Algorithm 4 (iDRegEx) exactly:Changes
bex/kore.pyikoa+rwr_sq+ MDLbex/ensemble.pyinfer_ensemblealongside CRX and iDRegEx. Refactored prefer logic into a clean dispatch tablebex/__init__.pykOREInferenceandvalidate_k_oretests/test_kore.pytests/test_ensemble.pyTest count: 79 total (28 existing + 32 kORE + 19 ensemble), all passing.
Key finding
Applied to 3,926 real opencode step-boundary tool sequences,
kOREInferencereturnsNone— rwr0 cannot handle the interconnectivity. The tool graph (read→bash, read→grep, read→glob, bash→read, etc.) is not SORE (Single Occurrence Regular Expression). This is a genuine empirical result: the agent's behavior within steps is probabilistic, not grammatically structured.Next steps
examples/directory (untracked) has exploratory scripts for the session-data analysisNeu:
min_coverageCore + Outlier Analyseinfer_ensemble(sequences, min_coverage=0.8)findet jetzt zusätzlich den engsten Kern per iterativem CRX + Outlier-Removal:min_coverage(default 1.0 = kein Filter) der Sequenzen übrig istresult['core']mit{grammar, coverage, outlier_count, outliers}Beispiel: 15 Ansible-Rollen
Die Outlier sind Rollen mit den seltensten Symbolen (
npm,pip,lineinfile). Ein LLM sieht: "10/15 Rollen folgen dem Kern. Nur wer spezifische Tools braucht, fügt extras hinzu."Tests
85 Tests insgesamt (+6 neue für min_coverage), alle grün.
Letzter Commit: MCP-Tool + README aktualisiert.
infer_best_grammarhat jetztmin_coverageParameter (default 1.0 = deaktiviert)min_coverage=0.8), kommt Core+Outlier-Analyse in der Antwort