docs: update README and SHOWCASE for kOREInference + core/outlier analysis
This commit is contained in:
parent
036a84cc76
commit
0886e5f3bc
2 changed files with 38 additions and 6 deletions
10
README.md
10
README.md
|
|
@ -79,6 +79,11 @@ Agent: Let me check what pattern the existing community roles follow.
|
|||
|
||||
**With Dervish:** one MCP call returns a ~60-token grammar known to match 15/15 existing roles. The agent follows it reliably.
|
||||
|
||||
**Core+outlier mode:** When generating a new role, the agent can call with
|
||||
`min_coverage=0.8` to learn the mainstream pattern while seeing which roles
|
||||
deviate and why — useful when the user's case resembles an outlier
|
||||
(e.g., a PHP app like phpmyadmin that needs raw `lineinfile`).
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
|
|
@ -125,6 +130,8 @@ Dervish has been tested against public datasets from Ansible Galaxy, Helm, and G
|
|||
|
||||
The sweet spot: **multiple implementations of the same abstract task** with a shared but undocumented pattern. Not everything works — Dockerfiles, pre-commit configs, and schema-enforced formats are too rigid or too diverse to yield a convention.
|
||||
|
||||
> **kOREInference note:** Algorithm 4 (iDRegEx with MDL, arXiv 1004.2372) is included for paper-faithful correctness. On real tool-sequence data, its rwr₀ repair step returns ∅ because the k-OA is rarely SORE (interconnected symbols). The ensemble falls back to CRX or iDRegEx automatically.
|
||||
|
||||
## Algorithm Selection Guide
|
||||
|
||||
| When | Use | Why |
|
||||
|
|
@ -139,8 +146,9 @@ The sweet spot: **multiple implementations of the same abstract task** with a sh
|
|||
|
||||
| Data property | Winner | Why |
|
||||
|---------------|--------|-----|
|
||||
| Diverse patterns, full vocabulary needed | CRX | Captures all symbols. iDRegEx/kOREInference return ∅. |
|
||||
| Diverse patterns, full vocabulary needed | CRX | Captures all symbols. iDRegEx returns ∅. |
|
||||
| Clean sequences with clear core | iDRegEx | Extracts minimal common subsequence. CRX buries it in optional noise. |
|
||||
| Interconnected (non-SORE) data | CRX | kOREInference (rwr₀) returns ∅ when k-OA is not SORE. CRX handles it. |
|
||||
| Single sequence | iDRegEx (+ RWR₀) | RWR₀ repair produces a grammatical regex from one example. |
|
||||
| 2–3 sequences | iDRegEx | CRX overfits. iDRegEx handles noise better. |
|
||||
| Many sequences, tight pattern | CRX | Learns precise concatenation with optional suffixes. |
|
||||
|
|
|
|||
34
SHOWCASE.md
34
SHOWCASE.md
|
|
@ -34,6 +34,25 @@ All 15/15 match. **~29× compression** (7200+ modules → ~250 chars).
|
|||
exact structure: fail-check first, then vars, then packages, then config/svc.
|
||||
No guessing.
|
||||
|
||||
### Bonus: core+outlier analysis
|
||||
|
||||
Set `min_coverage=0.8` to find the tight grammar for the majority while
|
||||
flagging outlier roles with unusual module usage:
|
||||
|
||||
```
|
||||
Core CRX (80% coverage, 3 outliers):
|
||||
fail?.(include_vars+set_fact+package+file+template+service+...)+
|
||||
|
||||
Outlier sequences:
|
||||
1. phpmyadmin: include_vars → set_fact → include → include → lineinfile
|
||||
2. composer: fail → set_fact → stat → uri → get_url → command
|
||||
3. pip: package → file → pip
|
||||
```
|
||||
|
||||
phpmyadmin uses raw `lineinfile` instead of templates; composer needs
|
||||
a `stat` check + `uri` download; pip is purely `pip` — all three deviate
|
||||
from the mainstream install → configure → enable pattern.
|
||||
|
||||
## 2. Helm chart (kube-prometheus-stack, 6 configs)
|
||||
|
||||
6 different `values.yaml` files rendered through the same chart:
|
||||
|
|
@ -77,10 +96,15 @@ with a shared but undocumented pattern.
|
|||
## Usage
|
||||
|
||||
```python
|
||||
from bex.mcp_server import infer_best_grammar
|
||||
from bex import infer_ensemble
|
||||
|
||||
output = infer_best_grammar(
|
||||
sequences=role_sequences,
|
||||
prefer="crx",
|
||||
)
|
||||
# Pick best across all 3 algorithms (CRX + iDRegEx + kOREInference)
|
||||
result = infer_ensemble(role_sequences)
|
||||
print(f"Best: {result['best']['algorithm']}")
|
||||
print(f"Grammar: {result['best']['grammar']}")
|
||||
|
||||
# Or: find the tight core + flag outliers
|
||||
result = infer_ensemble(role_sequences, min_coverage=0.8)
|
||||
print(f"Core: {result['core']['grammar']}")
|
||||
print(f"Outliers: {result['core']['outliers']}")
|
||||
```
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue