grammar-inference-engine/SHOWCASE.md

65 lines
1.8 KiB
Markdown
Raw Normal View History

# Grammar Inference Engine — Showcase
Infer the unwritten convention from existing examples. Given N example
sequences, produce a ~100-char grammar that captures the structural
pattern — in far fewer tokens than the originals.
## How it works
Your agent calls the MCP tool `infer_best_grammar` with a list of
existing sequences. It returns a compressed grammar:
```
a.b → a then b (concatenation)
(a+b) → a or b (disjunction)
r? → optional (zero or one)
r+ → one or more (iteration)
r+? → zero or more
```
Use `prefer='crx'` for full coverage (accepts all examples), or let the
ensemble pick between CRX and iDRegEx by MDL score.
## Ansible Galaxy roles — 15 geerlingguy roles
Jeff Geerling maintains 100+ of the most popular Ansible roles on
Galaxy. He has never written down their task structure. Our grammar is
the first explicit description:
```
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
include+?.(npm+pip)+?.lineinfile?
CRX MDL= 596.64 match=15/15
```
Every role follows the same arc: check prerequisites, OS-specific vars,
install packages, configure with templates, start services, optionally
run sub-tasks. It works because 15 roles all converged on the same
unwritten convention.
**Compression: 15 roles (~5,000 tokens) → 60 tokens.**
## Notation reference
| Symbol | Meaning |
|--------|---------|
| `a.b` | a then b |
| `(a+b)` | a or b (CRX disjunction) |
| `(a\|b)` | a or b (iDRegEx disjunction) |
| `r?` | zero or one |
| `r+` | one or more |
| `r+?` | zero or more |
| `MDL` | Minimum Description Length — lower is better |
## Usage
```python
from bex.mcp_server import infer_best_grammar
output = infer_best_grammar(
sequences=role_sequences,
prefer="crx",
)
```