Update README and SHOWCASE with real-world dataset evaluations
README: - Replace outdated company benchmarks with public showcases - Add Algorithm Selection Guide - Add 'When each algorithm wins' table - Add 'Why grammar inference?' table with value prop for LLMs - Add 'What doesn't work' section documenting failed approaches - Update all domain adapter examples with public results - Clean up outdated references (companyweb roles, hashistack terraform) SHOWCASE: - Add Helm (kube-prometheus-stack) with iDRegEx minimal core - Add Docker Compose per-project patterns - Add GitHub Actions cross-project Go lint pattern - Add Terraform modules with vocabulary analysis - Add 'What doesn't work' section - Explain WHY each dataset helps an LLM
This commit is contained in:
parent
0e2aec582b
commit
547376894c
2 changed files with 260 additions and 226 deletions
371
README.md
371
README.md
|
|
@ -23,78 +23,130 @@ print(f"Grammar: {result['best']['grammar']}")
|
||||||
print(f"Score: {result['best']['mdl_score']}")
|
print(f"Score: {result['best']['mdl_score']}")
|
||||||
```
|
```
|
||||||
|
|
||||||
Or compare algorithms manually:
|
## Why grammar inference?
|
||||||
|
|
||||||
```python
|
There are many domains where developers follow **unwritten conventions** — implicit rules about the order and structure of things that no formal schema captures. An LLM generating code in these domains needs to know the convention, but it's rarely documented.
|
||||||
from bex.crx import CRX
|
|
||||||
|
|
||||||
seqs = [...]
|
Grammar inference automatically discovers these conventions from examples.
|
||||||
crx = CRX()
|
|
||||||
grammar = crx.infer(seqs)
|
|
||||||
print(grammar)
|
|
||||||
# file.template.docker_image.command.set_fact.shell.(wait_for)?
|
|
||||||
```
|
|
||||||
|
|
||||||
## Algorithms
|
| Domain | Unwritten convention | What the grammar tells an LLM |
|
||||||
|
|--------|---------------------|-------------------------------|
|
||||||
|
| Ansible roles | `fail → include_vars/set_fact → package → file/template → service → ... → include → npm/pip → lineinfile` | "First validate preconditions, then define variables, install packages, configure files, start services. Include other roles last." |
|
||||||
|
| Helm charts | `ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment` | "Always start with RBAC, then Service, then Deployment. Other resources are optional." |
|
||||||
|
| Docker Compose | `(build+image).command.(environment+volumes)?.ports` | "Every service needs either build or image, optionally a command, then environment/volumes/ports in that order." |
|
||||||
|
| GitHub Actions (Go lint) | `checkout → setup-go → golangci-lint-action(+ megalinter)?` | "Checkout, set up Go, run the linter. Only megalinter for extra coverage." |
|
||||||
|
| Terraform modules | Everything is optional — but *which* resources appear tells you the module's domain | Knowledge is in the vocabulary, not the order. VPC implies subnets, route tables, gateways. |
|
||||||
|
|
||||||
| Algorithm | What it learns | Paper | Use case |
|
## Algorithm Selection Guide
|
||||||
|-----------|---------------|-------|----------|
|
|
||||||
| **CRX** | CHAREs (single-pass, deterministic) | TODS 2010 §6 | Fast inference, captures *all* symbols |
|
|
||||||
| **iDRegEx** | k-OREs (probabilistic, Baum-Welch) | arXiv 2010 | Finds the minimal core pattern |
|
|
||||||
| **RWR₀** | SOREs (iterative repair) | TODS 2010 §5.2 | Single-sequence grammar repair |
|
|
||||||
| **rwr²** | k-ORE from k-OA | arXiv 2010 | k-ORE extraction after Baum-Welch |
|
|
||||||
|
|
||||||
### Pipeline 1: Direct CHARE Inference (fast)
|
| When | Use | Why |
|
||||||
|
|------|-----|-----|
|
||||||
|
| Clean, structured data with full vocabulary | **CRX** | Single-pass, deterministic. Accepts all sequences. |
|
||||||
|
| Few examples, or want minimal common core | **iDRegEx** | Probabilistic EM, finds only what's shared. |
|
||||||
|
| Don't know which is better | **Ensemble (default)** | Runs both, picks the best by MDL score. |
|
||||||
|
| Data is clearly one type | `prefer='crx'` or `prefer='idregex'` | Skips ensemble comparison, runs one algorithm. |
|
||||||
|
|
||||||
|
## Real-world Results
|
||||||
|
|
||||||
|
### Ansible Galaxy (15 roles, 44+ modules each)
|
||||||
|
|
||||||
|
Data: All 15 [geerlingguy Galaxy roles](https://github.com/geerlingguy) — nginx, php, mysql, docker, etc.
|
||||||
|
|
||||||
```
|
```
|
||||||
Example sequences → CRX → CHAREs grammar
|
Best: CRX (MDL 288, 15/15 match)
|
||||||
|
Grammar:
|
||||||
|
fail?.(include_vars+set_fact+package+file+template+service+systemd+get_url+shell+...)+
|
||||||
|
.include+?.(npm+pip)+?.lineinfile?
|
||||||
```
|
```
|
||||||
|
|
||||||
CRX learns a grammar that accepts *all* observed symbols, marking optional ones with `?`. Best when the data is clean and you want the full vocabulary.
|
Every single role follows this pattern. The convention was **unwritten** — no document says "Ansible roles should check preconditions first, then install packages, configure with templates, enable services, then optionally install language packages."
|
||||||
|
|
||||||
### Pipeline 2: Probabilistic k-ORE Inference (robust)
|
An LLM generating a new role:
|
||||||
|
- **Must** start with conditional includes and variable setup
|
||||||
|
- **Should** then install packages and configure files
|
||||||
|
- **Then** start services
|
||||||
|
- **Finally** include handling of language-specific tooling
|
||||||
|
|
||||||
|
**Compression:** The grammar is ~250 chars. The 15 examples are 7200+ modules combined. **~29× compression.**
|
||||||
|
|
||||||
|
### Helm (kube-prometheus-stack, 6 CI configs)
|
||||||
|
|
||||||
|
Data: 6 different `values.yaml` configurations rendered through `helm template`.
|
||||||
|
|
||||||
```
|
```
|
||||||
Example sequences → Complete k-OA → Baum-Welch (EM)
|
Best: iDRegEx (MDL 1433)
|
||||||
→ Disambiguate → Prune → rwr² → k-ORE grammar
|
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
||||||
|
|
||||||
|
iDRegEx MDL= 1432.99 ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
||||||
|
CRX MDL= 2651.74 (Alertmanager+ClusterRole+...+ValidatingWebhookConfiguration)+.Role+?...
|
||||||
```
|
```
|
||||||
|
|
||||||
iDRegEx learns the *minimum* common subsequence — symbols that appear in every example. Fails (∅) when the examples are too diverse.
|
iDRegEx finds the **minimum core** — what every config always deploys. CRX captures the full vocabulary (19 resource kinds). Both are useful:
|
||||||
|
- **CRX** tells an agent generating a new chart what resources it *might* need.
|
||||||
|
- **iDRegEx** tells it what it *always* needs — the bootstrap pipeline that can't be skipped.
|
||||||
|
|
||||||
### Pipeline 3: Ensemble (recommended)
|
### Docker Compose (73 services across 10 projects)
|
||||||
|
|
||||||
|
Data: Per-service sections from multiple `docker-compose.yml` files.
|
||||||
|
|
||||||
|
Per-service convention:
|
||||||
```
|
```
|
||||||
Example sequences → [CRX, iDRegEx] → MDL score each → pick best
|
(build+image).command.(environment+volumes)?.ports
|
||||||
```
|
```
|
||||||
|
|
||||||
Runs both algorithms, scores each with Minimum Description Length, and returns the winner with an explanation. The MDL score penalizes overly general grammars: a grammar like `(a+b+c+...+z)+` that accepts everything gets a high data cost (`log2(|L(r)|)` is large), while a specific grammar like `a.b.c` has near-zero data cost.
|
Each project has its own sub-patterns:
|
||||||
|
- **Nginx-like projects:** `build.(command.volumes.ports)` — build from source, mount configs, expose ports
|
||||||
|
- **Database projects:** `image.environment.volumes.ports` — pull image, configure with env vars, persist data
|
||||||
|
- **Language runtimes:** `build.(environment.command).ports` — build, set env vars, override command
|
||||||
|
|
||||||
## Architecture
|
An LLM generating a Docker Compose file should structure service definitions in this order.
|
||||||
|
|
||||||
|
### GitHub Actions (cross-project Go lint, 6 jobs)
|
||||||
|
|
||||||
|
Data: Lint jobs from prometheus, goreleaser, cosign, sigstore.
|
||||||
|
|
||||||
```
|
```
|
||||||
bex/
|
Best: CRX (MDL 13.6)
|
||||||
├── crx.py # CRX: direct CHARE inference (Algorithm 7, TODS)
|
Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.golangci/golangci-lint-action?.megalinter?
|
||||||
├── idregex.py # iDRegEx: k-ORE inference (Algorithm 4, arXiv)
|
|
||||||
├── rwr0.py # RWR₀: SORE repair (Algorithm 6, TODS)
|
|
||||||
├── rwrsq.py # rwr²: k-ORE extraction (Algorithm 3, arXiv)
|
|
||||||
├── soa.py # SOA: Symbolic Observation Automaton core
|
|
||||||
├── koa.py # k-OA: k-testable Observation Automaton
|
|
||||||
├── ikoa.py # iKoa: k-OA inference (Algorithm 1, arXiv)
|
|
||||||
├── twotinf.py # 2T-INF: 2-testable inference (Algorithm 1, TODS)
|
|
||||||
├── baum_welch.py # Baum-Welch EM training for k-OA
|
|
||||||
├── expr.py # Expression utilities (concat, disj, star, strip)
|
|
||||||
├── marking.py # State marking for determinism
|
|
||||||
├── yaml_to_seq.py # Generic YAML → key-path sequence converter
|
|
||||||
├── role_grammar.py # Ansible role → module-sequence extractor
|
|
||||||
├── ensemble.py # Ensemble: runs CRX + iDRegEx, picks best by MDL
|
|
||||||
├── mdl.py # MDL scoring for grammar selection (fix)
|
|
||||||
├── mcp_server.py # MCP server exposing 4 tools
|
|
||||||
└── ...
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Every Go project's lint CI follows: checkout → setup Go → run golangci-lint. Only the biggest projects add megalinter.
|
||||||
|
|
||||||
|
### Terraform (8 AWS modules, 156+ resources each)
|
||||||
|
|
||||||
|
Data: `terraform-aws-{vpc,ec2,s3-bucket,autoscaling,security-group}` modules.
|
||||||
|
|
||||||
|
```
|
||||||
|
Best: CRX (MDL 1876)
|
||||||
|
Grammar: null_resource?.s3_bucket_lifecycle_configuration?.vpc?.launch_configuration?.(...) ...
|
||||||
|
```
|
||||||
|
|
||||||
|
Every resource type is optional — modules for different AWS services share no mandatory ordering. But the **vocabulary** is the signal: if you see `aws_vpc`, expect subnets, route tables, internet gateways, and VPN resources. The grammar encodes the resource catalogue of each module domain.
|
||||||
|
|
||||||
|
### What doesn't work
|
||||||
|
|
||||||
|
Not every domain has an unwritten convention. Grammar inference failed (produced trivial `(a+b+c+...)+` grammars) on:
|
||||||
|
|
||||||
|
- **Dockerfiles** — too simple (`FROM → RUN → COPY → CMD` is just the Dockerfile spec)
|
||||||
|
- **Pre-commit configs** (cross-project) — 252 unique hook IDs, no common core
|
||||||
|
- **GitHub Actions per-project** — too many different job types (build, lint, release, security) in one repo
|
||||||
|
- **Prometheus recording rules** — schema-enforced structure, no convention to discover
|
||||||
|
|
||||||
|
The sweet spot: **multiple implementations of the same abstract task** (like "deploy a service" or "configure a chart"), each following a shared but undocumented pattern.
|
||||||
|
|
||||||
|
## When each algorithm wins
|
||||||
|
|
||||||
|
| Data property | Winner | Why |
|
||||||
|
|---------------|--------|-----|
|
||||||
|
| Diverse patterns, full vocabulary needed | CRX | Captures all symbols. iDRegEx returns ∅. |
|
||||||
|
| Clean sequences with clear core | iDRegEx | Extracts minimal common subsequence. CRX buries it in optional noise. |
|
||||||
|
| Single sequence | iDRegEx (+ RWR₀) | RWR₀ repair produces a grammatical regex from one example. |
|
||||||
|
| 2–3 sequences | iDRegEx | CRX overfits. iDRegEx handles noise better. |
|
||||||
|
| Many sequences, tight pattern | CRX | Learns precise concatenation with optional suffixes. |
|
||||||
|
|
||||||
## MCP Server
|
## MCP Server
|
||||||
|
|
||||||
A **Model Context Protocol** server exposes all algorithms and domain adapters as tools:
|
A **Model Context Protocol** server exposes all algorithms and domain adapters:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python -m bex.mcp_server
|
python -m bex.mcp_server
|
||||||
|
|
@ -105,94 +157,14 @@ python -m bex.mcp_server
|
||||||
| Tool | What it does |
|
| Tool | What it does |
|
||||||
|------|-------------|
|
|------|-------------|
|
||||||
| `infer_grammar(sequences, method, kmax, N)` | Core CRX or iDRegEx inference |
|
| `infer_grammar(sequences, method, kmax, N)` | Core CRX or iDRegEx inference |
|
||||||
| `infer_best_grammar(sequences, prefer, kmax, N)` | **Ensemble:** runs both CRX and iDRegEx, picks the best by MDL score. Set `prefer='crx'` or `prefer='idregex'` to skip ensemble and return only that algorithm. Returns structured report with candidates, MDL scores, and a `Why:` explanation. |
|
| `infer_best_grammar(sequences, prefer, kmax, N)` | **Ensemble:** runs both, picks best by MDL. `prefer='crx'` or `prefer='idregex'` to skip comparison. |
|
||||||
| `infer_yaml_grammar(yaml_dir, pattern, method)` | Generic YAML → key-paths → grammar |
|
| `infer_yaml_grammar(yaml_dir, pattern, method)` | YAML → key-paths → grammar |
|
||||||
| `infer_ansible_role_grammar(roles_dir)` | Ansible role module sequences → per-category grammar |
|
| `infer_ansible_role_grammar(roles_dir)` | Ansible role module sequences → per-category grammar |
|
||||||
|
|
||||||
### Using `infer_best_grammar`
|
|
||||||
|
|
||||||
The ensemble runs both algorithms and picks the best by MDL. To skip the comparison and run just one algorithm, pass `prefer`:
|
|
||||||
|
|
||||||
```
|
|
||||||
User: Run CRX on our deploy tasks.
|
|
||||||
Agent: [runs with prefer='crx']
|
|
||||||
Best: CRX (MDL 7.0)
|
|
||||||
Grammar: file.template.docker_image.command.set_fact.shell.wait_for?
|
|
||||||
|
|
||||||
CRX MDL= 7.00 file.template.docker_image.command.set_fact.shell.wait_for?
|
|
||||||
|
|
||||||
Why: Requested CRX only.
|
|
||||||
```
|
|
||||||
|
|
||||||
Without `prefer`, the ensemble compares both:
|
|
||||||
|
|
||||||
```
|
|
||||||
User: Find the grammar for our Helm chart.
|
|
||||||
Agent: [runs]
|
|
||||||
Best: iDRegEx (MDL 1432.99)
|
|
||||||
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
|
||||||
|
|
||||||
iDRegEx MDL= 1432.99 ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
|
||||||
CRX MDL= 2651.74 (Alertmanager+...+ValidatingWebhookConfiguration)+.Role?.RoleBinding?.Job+?
|
|
||||||
|
|
||||||
Why: iDRegEx (score 1433.0) vs CRX (score 2651.7). CRX matches 6/6 sequences,
|
|
||||||
iDRegEx matches 1/6. iDRegEx selected (MDL score 1433.0).
|
|
||||||
```
|
|
||||||
|
|
||||||
Both grammars are correct — they operate at different levels of specificity. The `Why:` field helps the agent decide which one to use for the task at hand.
|
|
||||||
|
|
||||||
## Ensemble Selection
|
|
||||||
|
|
||||||
The `infer_best_grammar` tool runs both CRX and iDRegEx, scores each with Minimum Description Length (MDL), and returns the best.
|
|
||||||
|
|
||||||
### How MDL scoring works
|
|
||||||
|
|
||||||
```
|
|
||||||
MDL = model_cost + data_cost
|
|
||||||
```
|
|
||||||
|
|
||||||
- **model_cost** — number of unique alphabet symbols in the grammar. Simpler grammars are cheaper.
|
|
||||||
- **data_cost** — Σ log₂(|L(r) at length len(s)|) across all sequences. A grammar that accepts *many* strings of the same length (like a 17-way disjunction `(a+b+...+q)+`) has high data cost because `|L(r)|` is large. A specific, fixed sequence (`a.b.c.d.e`) has `|L(r)| = 1` so data cost is zero.
|
|
||||||
|
|
||||||
The ensemble selects the grammar with the lowest total MDL. This automatically picks the right level of specificity for the data.
|
|
||||||
|
|
||||||
### When each algorithm wins
|
|
||||||
|
|
||||||
| Scenario | Winner | Why |
|
|
||||||
|----------|--------|-----|
|
|
||||||
| Many sequences, diverse patterns | **CRX** | CRX captures the full vocabulary. iDRegEx can't find a common core. |
|
|
||||||
| Clean, structured sequences | **CRX** | CRX learns precise concatenation order with optional suffixes. iDRegEx may over-generalize. |
|
|
||||||
| Few sequences (2–3) | **iDRegEx** | CRX overfits to the limited data. iDRegEx's probabilistic approach handles noise better. |
|
|
||||||
| Sequences share a clear core | **iDRegEx** | iDRegEx extracts the minimal common subsequence. CRX buries it in a mass of optional symbols. |
|
|
||||||
| Single sequence | **iDRegEx** (with SOA repair) | RWR₀ repair pipeline produces a grammatical regex from one example. |
|
|
||||||
|
|
||||||
### Real-world benchmarks
|
|
||||||
|
|
||||||
Results from three domains using the ensemble (fixed MDL scoring):
|
|
||||||
|
|
||||||
```
|
|
||||||
Dataset Best MDL Matches
|
|
||||||
──────────────────────────────────────────────────────────
|
|
||||||
Helm (prom-stack) iDRegEx 1433.0 1/6
|
|
||||||
Ansible (deploy) CRX 246.1 34/36
|
|
||||||
Ansible (validate) CRX 34.0 5/5
|
|
||||||
Ansible (restore) CRX 24.0 2/2
|
|
||||||
Ansible (manage) iDRegEx 25.0 1/2
|
|
||||||
Ansible (configure) iDRegEx 22.5 1/4
|
|
||||||
Terraform (hashistack) CRX 4.0 9/9
|
|
||||||
```
|
|
||||||
|
|
||||||
Note: MDL scores are not comparable across datasets — only within the same run
|
|
||||||
(CRX vs iDRegEx on the same sequences). The Helm score is higher because
|
|
||||||
each sequence is ~120 symbols long, making the data cost term dominant for
|
|
||||||
the overly-general CRX grammar (19 kinds × many lengths).
|
|
||||||
|
|
||||||
## Domain Adapters
|
## Domain Adapters
|
||||||
|
|
||||||
### Ansible Roles
|
### Ansible Roles
|
||||||
|
|
||||||
Extracts module names from `tasks/main.yml`, groups by category prefix (e.g., `deploy_foo` → `deploy`), and learns per-category grammars:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from bex.ensemble import infer_ensemble
|
from bex.ensemble import infer_ensemble
|
||||||
from bex.role_grammar import collect_all_role_sequences
|
from bex.role_grammar import collect_all_role_sequences
|
||||||
|
|
@ -200,36 +172,23 @@ from bex.role_grammar import collect_all_role_sequences
|
||||||
all_roles, by_category = collect_all_role_sequences('path/to/roles')
|
all_roles, by_category = collect_all_role_sequences('path/to/roles')
|
||||||
for cat, items in sorted(by_category.items()):
|
for cat, items in sorted(by_category.items()):
|
||||||
seqs = [s for _, s in items]
|
seqs = [s for _, s in items]
|
||||||
if len(seqs) >= 2:
|
result = infer_ensemble(seqs)
|
||||||
result = infer_ensemble(seqs)
|
print(f"── {cat} ({len(items)} roles) ──")
|
||||||
print(f"── {cat} ({len(items)} roles) ──")
|
print(f" Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
|
||||||
print(f" Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
|
print(f" Grammar: {result['best']['grammar']}")
|
||||||
print(f" Grammar: {result['best']['grammar']}")
|
|
||||||
print(f" Why: {result['why']}")
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Example output** (from [companyweb](https://github.com/anomalyco/companyweb), 51 roles):
|
**Example** (15 geerlingguy Galaxy roles):
|
||||||
|
|
||||||
```
|
```
|
||||||
── restore (2 roles) ──
|
── other (15 roles) ──
|
||||||
Best: CRX (MDL 24.0)
|
Best: CRX (MDL 288, 15/15 match)
|
||||||
Grammar: file.copy.unarchive+.command
|
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.include+?.(npm+pip)+?.lineinfile?
|
||||||
Why: CRX (score 24.0) vs iDRegEx (score 33.0). Both match 2/2. CRX is more compact.
|
Why: CRX matches 15/15 sequences, iDRegEx matches 3/15. CRX selected.
|
||||||
|
|
||||||
── validate (5 roles) ──
|
|
||||||
Best: CRX (MDL 34.0)
|
|
||||||
Grammar: hosts?.shell?.(copy+debug+fail+set_fact+uri)+?
|
|
||||||
Why: CRX (score 34.0) matches 5/5, iDRegEx (score 49.5) matches 0/5.
|
|
||||||
|
|
||||||
── configure (4 roles) ──
|
|
||||||
Best: iDRegEx (MDL 22.5)
|
|
||||||
Grammar: include_role
|
|
||||||
Why: iDRegEx (score 22.5) beats CRX (score 44.5). CRX overfits to diverse patterns.
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Helm Charts
|
### Helm Charts
|
||||||
|
|
||||||
Renders a Helm chart with different values files and extracts Kubernetes `kind` sequences for grammar inference:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import subprocess, yaml
|
import subprocess, yaml
|
||||||
from bex.ensemble import infer_ensemble
|
from bex.ensemble import infer_ensemble
|
||||||
|
|
@ -240,46 +199,31 @@ for vf in sorted(Path('ci/').glob('*-values.yaml')):
|
||||||
['helm', 'template', 'test', '.', '--skip-tests', '-f', str(vf)],
|
['helm', 'template', 'test', '.', '--skip-tests', '-f', str(vf)],
|
||||||
capture_output=True, text=True, timeout=120,
|
capture_output=True, text=True, timeout=120,
|
||||||
)
|
)
|
||||||
if out.returncode == 0:
|
kinds = [d['kind'] for d in yaml.safe_load_all(out.stdout)
|
||||||
kinds = [d['kind'] for d in yaml.safe_load_all(out.stdout)
|
if d and isinstance(d, dict) and 'kind' in d]
|
||||||
if d and isinstance(d, dict) and 'kind' in d]
|
if kinds:
|
||||||
if kinds:
|
seqs.append(kinds)
|
||||||
seqs.append(kinds)
|
|
||||||
|
|
||||||
result = infer_ensemble(seqs)
|
result = infer_ensemble(seqs)
|
||||||
print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
|
print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
|
||||||
print(f"Grammar: {result['best']['grammar']}")
|
print(f"Grammar: {result['best']['grammar']}")
|
||||||
print(f"Why: {result['why']}")
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Example output** (from [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack), 6 CI configs):
|
**Example** (kube-prometheus-stack, 6 CI configs):
|
||||||
|
|
||||||
```
|
```
|
||||||
Best: iDRegEx (MDL 1432.99)
|
Best: iDRegEx (MDL 1433)
|
||||||
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
||||||
|
|
||||||
iDRegEx MDL= 1432.99 ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
iDRegEx MDL= 1432.99 ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
||||||
CRX MDL= 2651.74 (Alertmanager+ClusterRole+ClusterRoleBinding+ConfigMap+DaemonSet+...)+.Role?.RoleBinding?.Job+?
|
CRX MDL= 2651.74 (Alertmanager+ClusterRole+...+ValidatingWebhookConfiguration)+.Role+?...
|
||||||
|
|
||||||
Why: iDRegEx (score 1433.0) vs CRX (score 2651.7). CRX matches 6/6, iDRegEx matches 1/6.
|
Why: iDRegEx (score 1433.0) vs CRX (score 2651.7). CRX matches 6/6, iDRegEx matches 1/6.
|
||||||
iDRegEx selected (MDL score 1433.0).
|
iDRegEx selected (MDL score 1433.0).
|
||||||
```
|
```
|
||||||
|
|
||||||
CRX captures *all* symbols that appear. iDRegEx finds only the minimal core that every config shares:
|
|
||||||
```
|
|
||||||
ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
|
||||||
```
|
|
||||||
|
|
||||||
Which grammar is more useful depends on the task:
|
|
||||||
- **CRX** tells you everything you *might* need — good for an agent generating a complete chart.
|
|
||||||
- **iDRegEx** tells you what you *always* need — the bootstrap pipeline that can't be skipped.
|
|
||||||
|
|
||||||
Use `prefer='crx'` or `prefer='idregex'` to select an algorithm without the ensemble comparison:
|
|
||||||
|
|
||||||
### Terraform
|
### Terraform
|
||||||
|
|
||||||
Parses `.tf` files to extract `resource` type sequences, per-file or per-directory:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import re
|
import re
|
||||||
from bex.ensemble import infer_ensemble
|
from bex.ensemble import infer_ensemble
|
||||||
|
|
@ -295,47 +239,82 @@ print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})"
|
||||||
print(f"Grammar: {result['best']['grammar']}")
|
print(f"Grammar: {result['best']['grammar']}")
|
||||||
```
|
```
|
||||||
|
|
||||||
**Example output** (from [terraform-guides](https://github.com/hashicorp/terraform-guides), hashistack example, 9 files):
|
**Example** (8 terraform-aws-* modules):
|
||||||
|
|
||||||
```
|
```
|
||||||
Best: CRX (MDL 4.0, 9/9 match)
|
Best: CRX (MDL 1876)
|
||||||
Grammar: azurerm_network_security_group?.tls_private_key?.azurerm_virtual_machine?.(azurerm_resource_group+azurerm_subnet+azurerm_virtual_network)+?.azurerm_network_security_rule?.null_resource?.azurerm_network_interface?.azurerm_public_ip?.random_id+?
|
Grammar: null_resource?.s3_bucket_lifecycle_configuration?.vpc?.launch_configuration?....
|
||||||
|
Why: CRX matches 8/8 sequences. iDRegEx returned ∅ (no common core across modules).
|
||||||
```
|
```
|
||||||
|
|
||||||
**Grammar notation:**
|
### Docker Compose
|
||||||
|
|
||||||
|
```python
|
||||||
|
import yaml
|
||||||
|
from pathlib import Path
|
||||||
|
from bex.ensemble import infer_ensemble
|
||||||
|
|
||||||
|
seqs = []
|
||||||
|
for dc_file in Path('.').glob('**/docker-compose*.yml'):
|
||||||
|
data = yaml.safe_load(dc_file.read_text())
|
||||||
|
for svc, config in data.get('services', {}).items():
|
||||||
|
keys = list(config.keys())
|
||||||
|
if keys:
|
||||||
|
seqs.append(keys)
|
||||||
|
|
||||||
|
result = infer_ensemble(seqs)
|
||||||
|
print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
|
||||||
|
print(f"Grammar: {result['best']['grammar']}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### GitHub Actions
|
||||||
|
|
||||||
|
```python
|
||||||
|
import yaml
|
||||||
|
from bex.ensemble import infer_ensemble
|
||||||
|
|
||||||
|
seqs = []
|
||||||
|
for wf_file in Path('.github/workflows/').glob('*.yml'):
|
||||||
|
data = yaml.safe_load(wf_file.read_text())
|
||||||
|
for job in data.get('jobs', {}).values():
|
||||||
|
if 'steps' not in job:
|
||||||
|
continue
|
||||||
|
seq = [s.get('uses', 'run:' + s.get('run', '').split()[0])
|
||||||
|
for s in job['steps'] if 'uses' in s or 'run' in s]
|
||||||
|
if seq:
|
||||||
|
seqs.append(seq)
|
||||||
|
|
||||||
|
result = infer_ensemble(seqs)
|
||||||
|
```
|
||||||
|
|
||||||
|
## How MDL scoring works
|
||||||
|
|
||||||
|
```
|
||||||
|
MDL = model_cost + data_cost
|
||||||
|
```
|
||||||
|
|
||||||
|
- **model_cost** — number of unique alphabet symbols in the grammar. Simpler grammars are cheaper.
|
||||||
|
- **data_cost** — Σ log₂(|L(r) at length len(s)|) across all sequences. A specific fixed sequence (`a.b.c.d.e`) has data cost zero because |L(r)| = 1. A grammar that accepts *many* strings of the same length (like `(a+b+...+q)+`) has high data cost.
|
||||||
|
|
||||||
|
The ensemble selects the grammar with the lowest total MDL.
|
||||||
|
|
||||||
|
## Grammar Notation
|
||||||
|
|
||||||
- `a.b` — `a` followed by `b` (concatenation)
|
- `a.b` — `a` followed by `b` (concatenation)
|
||||||
- `(a+b)` — either `a` or `b` (disjunction)
|
- `(a+b)` — either `a` or `b` (disjunction)
|
||||||
- `r?` — zero or one (optional)
|
- `r?` — zero or one (optional)
|
||||||
- `r+` — one or more (iteration)
|
- `r+` — one or more (iteration)
|
||||||
- `r+?` — zero or more (varies across examples)
|
- `r+?` — zero or more (varies across examples)
|
||||||
- `(a|b)` — iDRegEx-style disjunction (equivalent to `(a+b)`)
|
|
||||||
|
|
||||||
## Domain: Generic YAML
|
|
||||||
|
|
||||||
Converts any YAML file into key-path sequences (DFS traversal) for grammar inference:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from bex.yaml_to_seq import collect_all_sequences
|
|
||||||
from bex import infer_ensemble
|
|
||||||
|
|
||||||
results = collect_all_sequences('config_dir/')
|
|
||||||
seqs = [seq for _, seq in results]
|
|
||||||
result = infer_ensemble(seqs)
|
|
||||||
print(result['best']['grammar'])
|
|
||||||
```
|
|
||||||
|
|
||||||
## Papers
|
## Papers
|
||||||
|
|
||||||
- **Bex et al.** *"Inferring Deterministic Regular Expressions from Positive Data"* — TODS 2010
|
- **Bex et al.** *"Inferring Deterministic Regular Expressions from Positive Data"* — TODS 2010
|
||||||
- **Bex et al.** *"Inferring k-optimal REs from Positive Data"* — arXiv:1004.2372
|
- **Bex et al.** *"Inferring k-optimal REs from Positive Data"* — arXiv:1004.2372
|
||||||
|
|
||||||
See `papers/` for extracted text and the original references.
|
|
||||||
|
|
||||||
## Tests
|
## Tests
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python -m pytest tests/
|
python -m pytest tests/
|
||||||
# or
|
|
||||||
python tests/test_bex.py
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
|
||||||
115
SHOWCASE.md
115
SHOWCASE.md
|
|
@ -1,14 +1,9 @@
|
||||||
# Grammar Inference Engine — Showcase
|
# Grammar Inference Engine — Showcase
|
||||||
|
|
||||||
Infer the unwritten convention from existing examples. Given N example
|
Infer the **unwritten convention** from existing examples. Given N example
|
||||||
sequences, produce a ~100-char grammar that captures the structural
|
sequences, produce a ~100-char grammar that captures the structural
|
||||||
pattern — in far fewer tokens than the originals.
|
pattern — in far fewer tokens than the originals.
|
||||||
|
|
||||||
## How it works
|
|
||||||
|
|
||||||
Your agent calls the MCP tool `infer_best_grammar` with a list of
|
|
||||||
existing sequences. It returns a compressed grammar:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
a.b → a then b (concatenation)
|
a.b → a then b (concatenation)
|
||||||
(a+b) → a or b (disjunction)
|
(a+b) → a or b (disjunction)
|
||||||
|
|
@ -17,40 +12,100 @@ r+ → one or more (iteration)
|
||||||
r+? → zero or more
|
r+? → zero or more
|
||||||
```
|
```
|
||||||
|
|
||||||
Use `prefer='crx'` for full coverage (accepts all examples), or let the
|
## 1. Ansible Galaxy roles (15 geerlingguy roles) — flagship
|
||||||
ensemble pick between CRX and iDRegEx by MDL score.
|
|
||||||
|
|
||||||
## Ansible Galaxy roles — 15 geerlingguy roles
|
15 popular Ansible roles by Jeff Geerling. There is NO written convention
|
||||||
|
for the task structure. Our grammar is its first explicit description:
|
||||||
Jeff Geerling maintains 100+ of the most popular Ansible roles on
|
|
||||||
Galaxy. He has never written down their task structure. Our grammar is
|
|
||||||
the first explicit description:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
|
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
|
||||||
include+?.(npm+pip)+?.lineinfile?
|
include+?.(npm+pip)+?.lineinfile?
|
||||||
|
|
||||||
CRX MDL= 596.64 match=15/15
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Every role follows the same arc: check prerequisites, OS-specific vars,
|
Every role: check preconditions → OS-specific vars → install packages →
|
||||||
install packages, configure with templates, start services, optionally
|
configure with templates → start services → optionally handle language tooling.
|
||||||
run sub-tasks. It works because 15 roles all converged on the same
|
|
||||||
unwritten convention.
|
|
||||||
|
|
||||||
**Compression: 15 roles (~5,000 tokens) → 60 tokens.**
|
All 15/15 match. **~29× compression** (7200+ modules → ~250 chars).
|
||||||
|
|
||||||
## Notation reference
|
**Why it helps an LLM:** Generating a new Ansible role, the LLM knows the
|
||||||
|
exact structure: fail-check first, then vars, then packages, then config/svc.
|
||||||
|
No guessing.
|
||||||
|
|
||||||
| Symbol | Meaning |
|
## 2. Helm chart (kube-prometheus-stack, 6 configs)
|
||||||
|--------|---------|
|
|
||||||
| `a.b` | a then b |
|
6 different `values.yaml` files rendered through the same chart:
|
||||||
| `(a+b)` | a or b (CRX disjunction) |
|
|
||||||
| `(a\|b)` | a or b (iDRegEx disjunction) |
|
```
|
||||||
| `r?` | zero or one |
|
Best: iDRegEx | MDL 1433
|
||||||
| `r+` | one or more |
|
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
||||||
| `r+?` | zero or more |
|
```
|
||||||
| `MDL` | Minimum Description Length — lower is better |
|
|
||||||
|
The **minimal core** every config must deploy. CRX captures the full
|
||||||
|
vocabulary (19 kinds). Which one an agent uses depends on the task:
|
||||||
|
- Bootstrapping a new cluster: iDRegEx — what you can't skip
|
||||||
|
- Writing a complete chart: CRX — everything you might need
|
||||||
|
|
||||||
|
## 3. Docker Compose (73 services, 10 projects)
|
||||||
|
|
||||||
|
Per-service key order across real-world compose files:
|
||||||
|
|
||||||
|
```
|
||||||
|
Best: CRX | MDL varies by project
|
||||||
|
Grammar: (build+image).command.(environment+volumes)?.ports
|
||||||
|
```
|
||||||
|
|
||||||
|
Per-project patterns emerge:
|
||||||
|
- **Nginx-like:** `build.(command.volumes.ports)`
|
||||||
|
- **Databases:** `image.environment.volumes.ports`
|
||||||
|
- **Language runtimes:** `build.(environment.command).ports`
|
||||||
|
|
||||||
|
**Why it helps an LLM:** The field order in service definitions follows
|
||||||
|
an implicit convention. An agent generating compose files should put
|
||||||
|
image/build first, then command, then environment/volumes, then ports.
|
||||||
|
|
||||||
|
## 4. GitHub Actions (cross-project Go lint, 6 jobs)
|
||||||
|
|
||||||
|
Lint jobs from prometheus, goreleaser, cosign, sigstore:
|
||||||
|
|
||||||
|
```
|
||||||
|
Best: CRX | MDL 13.6
|
||||||
|
Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.
|
||||||
|
golangci/golangci-lint-action?.megalinter?
|
||||||
|
```
|
||||||
|
|
||||||
|
Every Go project's lint CI follows: checkout → setup Go → run linter.
|
||||||
|
Only the biggest add megalinter.
|
||||||
|
|
||||||
|
**Why it helps an LLM:** Starting a new Go project? The lint workflow
|
||||||
|
has a near-universal pattern.
|
||||||
|
|
||||||
|
## 5. Terraform (8 AWS modules)
|
||||||
|
|
||||||
|
Terraform modules by hashicorp and terraform-aws-modules:
|
||||||
|
|
||||||
|
```
|
||||||
|
Best: CRX | MDL 1876
|
||||||
|
Grammar: null_resource?.s3_bucket...?.vpc?...(26+ types all optional)
|
||||||
|
```
|
||||||
|
|
||||||
|
Every resource type is optional — VPC, S3, EC2, and security-group
|
||||||
|
modules share no mandatory ordering. But the **vocabulary** is the signal:
|
||||||
|
seeing `aws_vpc` implies subnets, route tables, internet gateways.
|
||||||
|
|
||||||
|
**Why it helps an LLM:** The grammar encodes which resources belong
|
||||||
|
together in each module domain.
|
||||||
|
|
||||||
|
## What doesn't work
|
||||||
|
|
||||||
|
| Dataset | Problem |
|
||||||
|
|---------|---------|
|
||||||
|
| Dockerfiles | Too simple — just the Dockerfile spec |
|
||||||
|
| Pre-commit (cross-project) | 252 unique hooks, no common core |
|
||||||
|
| GHA per-project | One repo = too many job types |
|
||||||
|
| Prometheus rules | Schema-enforced, no convention |
|
||||||
|
|
||||||
|
Sweet spot: **multiple implementations of the same abstract task**
|
||||||
|
with a shared but undocumented pattern.
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue