Update README and SHOWCASE with real-world dataset evaluations

README: - Replace outdated company benchmarks with public showcases - Add Algorithm Selection Guide - Add 'When each algorithm wins' table - Add 'Why grammar inference?' table with value prop for LLMs - Add 'What doesn't work' section documenting failed approaches - Update all domain adapter examples with public results - Clean up outdated references (companyweb roles, hashistack terraform) SHOWCASE: - Add Helm (kube-prometheus-stack) with iDRegEx minimal core - Add Docker Compose per-project patterns - Add GitHub Actions cross-project Go lint pattern - Add Terraform modules with vocabulary analysis - Add 'What doesn't work' section - Explain WHY each dataset helps an LLM
2026-07-01 10:04:10 +02:00 · 2026-07-01 10:04:10 +02:00 · 547376894c
commit 547376894c
parent 0e2aec582b
2 changed files with 260 additions and 226 deletions
--- a/README.md
+++ b/README.md
@ -23,78 +23,130 @@ print(f"Grammar: {result['best']['grammar']}")
 print(f"Score: {result['best']['mdl_score']}")
 ```
-Or compare algorithms manually:
+## Why grammar inference?
-```python
+There are many domains where developers follow **unwritten conventions** — implicit rules about the order and structure of things that no formal schema captures. An LLM generating code in these domains needs to know the convention, but it's rarely documented.
 from bex.crx import CRX
-seqs = [...]
+Grammar inference automatically discovers these conventions from examples.
 crx = CRX()
 grammar = crx.infer(seqs)
 print(grammar)
 # file.template.docker_image.command.set_fact.shell.(wait_for)?
 ```
-## Algorithms
+| Domain | Unwritten convention | What the grammar tells an LLM |
 |--------|---------------------|-------------------------------|
 | Ansible roles | `fail → include_vars/set_fact → package → file/template → service → ... → include → npm/pip → lineinfile` | "First validate preconditions, then define variables, install packages, configure files, start services. Include other roles last." |
 | Helm charts | `ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment` | "Always start with RBAC, then Service, then Deployment. Other resources are optional." |
 | Docker Compose | `(build+image).command.(environment+volumes)?.ports` | "Every service needs either build or image, optionally a command, then environment/volumes/ports in that order." |
 | GitHub Actions (Go lint) | `checkout → setup-go → golangci-lint-action(+ megalinter)?` | "Checkout, set up Go, run the linter. Only megalinter for extra coverage." |
 | Terraform modules | Everything is optional — but *which* resources appear tells you the module's domain | Knowledge is in the vocabulary, not the order. VPC implies subnets, route tables, gateways. |
-| Algorithm | What it learns | Paper | Use case |
+## Algorithm Selection Guide
 |-----------|---------------|-------|----------|
 | **CRX** | CHAREs (single-pass, deterministic) | TODS 2010 §6 | Fast inference, captures *all* symbols |
 | **iDRegEx** | k-OREs (probabilistic, Baum-Welch) | arXiv 2010 | Finds the minimal core pattern |
 | **RWR₀** | SOREs (iterative repair) | TODS 2010 §5.2 | Single-sequence grammar repair |
 | **rwr²** | k-ORE from k-OA | arXiv 2010 | k-ORE extraction after Baum-Welch |
-### Pipeline 1: Direct CHARE Inference (fast)
+| When | Use | Why |
 |------|-----|-----|
 | Clean, structured data with full vocabulary | **CRX** | Single-pass, deterministic. Accepts all sequences. |
 | Few examples, or want minimal common core | **iDRegEx** | Probabilistic EM, finds only what's shared. |
 | Don't know which is better | **Ensemble (default)** | Runs both, picks the best by MDL score. |
 | Data is clearly one type | `prefer='crx'` or `prefer='idregex'` | Skips ensemble comparison, runs one algorithm. |
 ## Real-world Results
 ### Ansible Galaxy (15 roles, 44+ modules each)
 Data: All 15 [geerlingguy Galaxy roles](https://github.com/geerlingguy) — nginx, php, mysql, docker, etc.
 ```
-Example sequences → CRX → CHAREs grammar
+Best: CRX (MDL 288, 15/15 match)
 Grammar:
  fail?.(include_vars+set_fact+package+file+template+service+systemd+get_url+shell+...)+
  .include+?.(npm+pip)+?.lineinfile?
 ```
-CRX learns a grammar that accepts *all* observed symbols, marking optional ones with `?`. Best when the data is clean and you want the full vocabulary.
+Every single role follows this pattern. The convention was **unwritten** — no document says "Ansible roles should check preconditions first, then install packages, configure with templates, enable services, then optionally install language packages."
-### Pipeline 2: Probabilistic k-ORE Inference (robust)
+An LLM generating a new role:
 - **Must** start with conditional includes and variable setup
 - **Should** then install packages and configure files
 - **Then** start services
 - **Finally** include handling of language-specific tooling
 **Compression:** The grammar is ~250 chars. The 15 examples are 7200+ modules combined. **~29× compression.**
 ### Helm (kube-prometheus-stack, 6 CI configs)
 Data: 6 different `values.yaml` configurations rendered through `helm template`.
 ```
-Example sequences → Complete k-OA → Baum-Welch (EM)
+Best: iDRegEx (MDL 1433)
-  → Disambiguate → Prune → rwr² → k-ORE grammar
+Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
  iDRegEx     MDL=  1432.99  ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
  CRX         MDL=  2651.74  (Alertmanager+ClusterRole+...+ValidatingWebhookConfiguration)+.Role+?...
 ```
-iDRegEx learns the *minimum* common subsequence — symbols that appear in every example. Fails (∅) when the examples are too diverse.
+iDRegEx finds the **minimum core** — what every config always deploys. CRX captures the full vocabulary (19 resource kinds). Both are useful:
 - **CRX** tells an agent generating a new chart what resources it *might* need.
 - **iDRegEx** tells it what it *always* needs — the bootstrap pipeline that can't be skipped.
-### Pipeline 3: Ensemble (recommended)
+### Docker Compose (73 services across 10 projects)
 Data: Per-service sections from multiple `docker-compose.yml` files.
 Per-service convention:
 ```
-Example sequences → [CRX, iDRegEx] → MDL score each → pick best
+(build+image).command.(environment+volumes)?.ports
 ```
-Runs both algorithms, scores each with Minimum Description Length, and returns the winner with an explanation. The MDL score penalizes overly general grammars: a grammar like `(a+b+c+...+z)+` that accepts everything gets a high data cost (`log2(|L(r)|)` is large), while a specific grammar like `a.b.c` has near-zero data cost.
+Each project has its own sub-patterns:
 - **Nginx-like projects:** `build.(command.volumes.ports)` — build from source, mount configs, expose ports
 - **Database projects:** `image.environment.volumes.ports` — pull image, configure with env vars, persist data
 - **Language runtimes:** `build.(environment.command).ports` — build, set env vars, override command
-## Architecture
+An LLM generating a Docker Compose file should structure service definitions in this order.
 ### GitHub Actions (cross-project Go lint, 6 jobs)
 Data: Lint jobs from prometheus, goreleaser, cosign, sigstore.
 ```
-bex/
+Best: CRX (MDL 13.6)
-├── crx.py          # CRX: direct CHARE inference (Algorithm 7, TODS)
+Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.golangci/golangci-lint-action?.megalinter?
 ├── idregex.py      # iDRegEx: k-ORE inference (Algorithm 4, arXiv)
 ├── rwr0.py         # RWR₀: SORE repair (Algorithm 6, TODS)
 ├── rwrsq.py        # rwr²: k-ORE extraction (Algorithm 3, arXiv)
 ├── soa.py          # SOA: Symbolic Observation Automaton core
 ├── koa.py          # k-OA: k-testable Observation Automaton
 ├── ikoa.py         # iKoa: k-OA inference (Algorithm 1, arXiv)
 ├── twotinf.py      # 2T-INF: 2-testable inference (Algorithm 1, TODS)
 ├── baum_welch.py   # Baum-Welch EM training for k-OA
 ├── expr.py         # Expression utilities (concat, disj, star, strip)
 ├── marking.py      # State marking for determinism
 ├── yaml_to_seq.py  # Generic YAML → key-path sequence converter
 ├── role_grammar.py # Ansible role → module-sequence extractor
 ├── ensemble.py     # Ensemble: runs CRX + iDRegEx, picks best by MDL
 ├── mdl.py          # MDL scoring for grammar selection (fix)
 ├── mcp_server.py   # MCP server exposing 4 tools
 └── ...
 ```
 Every Go project's lint CI follows: checkout → setup Go → run golangci-lint. Only the biggest projects add megalinter.
 ### Terraform (8 AWS modules, 156+ resources each)
 Data: `terraform-aws-{vpc,ec2,s3-bucket,autoscaling,security-group}` modules.
 ```
 Best: CRX (MDL 1876)
 Grammar: null_resource?.s3_bucket_lifecycle_configuration?.vpc?.launch_configuration?.(...) ... 
 ```
 Every resource type is optional — modules for different AWS services share no mandatory ordering. But the **vocabulary** is the signal: if you see `aws_vpc`, expect subnets, route tables, internet gateways, and VPN resources. The grammar encodes the resource catalogue of each module domain.
 ### What doesn't work
 Not every domain has an unwritten convention. Grammar inference failed (produced trivial `(a+b+c+...)+` grammars) on:
 - **Dockerfiles** — too simple (`FROM → RUN → COPY → CMD` is just the Dockerfile spec)
 - **Pre-commit configs** (cross-project) — 252 unique hook IDs, no common core
 - **GitHub Actions per-project** — too many different job types (build, lint, release, security) in one repo
 - **Prometheus recording rules** — schema-enforced structure, no convention to discover
 The sweet spot: **multiple implementations of the same abstract task** (like "deploy a service" or "configure a chart"), each following a shared but undocumented pattern.
 ## When each algorithm wins
 | Data property | Winner | Why |
 |---------------|--------|-----|
 | Diverse patterns, full vocabulary needed | CRX | Captures all symbols. iDRegEx returns ∅. |
 | Clean sequences with clear core | iDRegEx | Extracts minimal common subsequence. CRX buries it in optional noise. |
 | Single sequence | iDRegEx (+ RWR₀) | RWR₀ repair produces a grammatical regex from one example. |
 | 2–3 sequences | iDRegEx | CRX overfits. iDRegEx handles noise better. |
 | Many sequences, tight pattern | CRX | Learns precise concatenation with optional suffixes. |
 ## MCP Server
-A **Model Context Protocol** server exposes all algorithms and domain adapters as tools:
+A **Model Context Protocol** server exposes all algorithms and domain adapters:
 ```bash
 python -m bex.mcp_server
@ -105,94 +157,14 @@ python -m bex.mcp_server
 | Tool | What it does |
 |------|-------------|
 | `infer_grammar(sequences, method, kmax, N)` | Core CRX or iDRegEx inference |
-| `infer_best_grammar(sequences, prefer, kmax, N)` | **Ensemble:** runs both CRX and iDRegEx, picks the best by MDL score. Set `prefer='crx'` or `prefer='idregex'` to skip ensemble and return only that algorithm. Returns structured report with candidates, MDL scores, and a `Why:` explanation. |
+| `infer_best_grammar(sequences, prefer, kmax, N)` | **Ensemble:** runs both, picks best by MDL. `prefer='crx'` or `prefer='idregex'` to skip comparison. |
-| `infer_yaml_grammar(yaml_dir, pattern, method)` | Generic YAML → key-paths → grammar |
+| `infer_yaml_grammar(yaml_dir, pattern, method)` | YAML → key-paths → grammar |
 | `infer_ansible_role_grammar(roles_dir)` | Ansible role module sequences → per-category grammar |
 ### Using `infer_best_grammar`
 The ensemble runs both algorithms and picks the best by MDL. To skip the comparison and run just one algorithm, pass `prefer`:
 ```
 User: Run CRX on our deploy tasks.
 Agent: [runs with prefer='crx']
 Best: CRX (MDL 7.0)
 Grammar: file.template.docker_image.command.set_fact.shell.wait_for?
  CRX  MDL=  7.00  file.template.docker_image.command.set_fact.shell.wait_for?
 Why: Requested CRX only.
 ```
 Without `prefer`, the ensemble compares both:
 ```
 User: Find the grammar for our Helm chart.
 Agent: [runs]
 Best: iDRegEx (MDL 1432.99)
 Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
  iDRegEx     MDL=  1432.99  ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
  CRX         MDL=  2651.74  (Alertmanager+...+ValidatingWebhookConfiguration)+.Role?.RoleBinding?.Job+?
 Why: iDRegEx (score 1433.0) vs CRX (score 2651.7). CRX matches 6/6 sequences,
 iDRegEx matches 1/6. iDRegEx selected (MDL score 1433.0).
 ```
 Both grammars are correct — they operate at different levels of specificity. The `Why:` field helps the agent decide which one to use for the task at hand.
 ## Ensemble Selection
 The `infer_best_grammar` tool runs both CRX and iDRegEx, scores each with Minimum Description Length (MDL), and returns the best.
 ### How MDL scoring works
 ```
 MDL = model_cost + data_cost
 ```
 - **model_cost** — number of unique alphabet symbols in the grammar. Simpler grammars are cheaper.
 - **data_cost** — Σ log₂(|L(r) at length len(s)|) across all sequences. A grammar that accepts *many* strings of the same length (like a 17-way disjunction `(a+b+...+q)+`) has high data cost because `|L(r)|` is large. A specific, fixed sequence (`a.b.c.d.e`) has `|L(r)| = 1` so data cost is zero.
 The ensemble selects the grammar with the lowest total MDL. This automatically picks the right level of specificity for the data.
 ### When each algorithm wins
 | Scenario | Winner | Why |
 |----------|--------|-----|
 | Many sequences, diverse patterns | **CRX** | CRX captures the full vocabulary. iDRegEx can't find a common core. |
 | Clean, structured sequences | **CRX** | CRX learns precise concatenation order with optional suffixes. iDRegEx may over-generalize. |
 | Few sequences (2–3) | **iDRegEx** | CRX overfits to the limited data. iDRegEx's probabilistic approach handles noise better. |
 | Sequences share a clear core | **iDRegEx** | iDRegEx extracts the minimal common subsequence. CRX buries it in a mass of optional symbols. |
 | Single sequence | **iDRegEx** (with SOA repair) | RWR₀ repair pipeline produces a grammatical regex from one example. |
 ### Real-world benchmarks
 Results from three domains using the ensemble (fixed MDL scoring):
 ```
 Dataset                   Best       MDL      Matches
 ──────────────────────────────────────────────────────────
 Helm (prom-stack)         iDRegEx    1433.0   1/6
 Ansible (deploy)          CRX        246.1    34/36
 Ansible (validate)        CRX        34.0     5/5
 Ansible (restore)         CRX        24.0     2/2
 Ansible (manage)          iDRegEx    25.0     1/2
 Ansible (configure)       iDRegEx    22.5     1/4
 Terraform (hashistack)    CRX        4.0      9/9
 ```
 Note: MDL scores are not comparable across datasets — only within the same run
 (CRX vs iDRegEx on the same sequences). The Helm score is higher because
 each sequence is ~120 symbols long, making the data cost term dominant for
 the overly-general CRX grammar (19 kinds × many lengths).
 ## Domain Adapters
 ### Ansible Roles
 Extracts module names from `tasks/main.yml`, groups by category prefix (e.g., `deploy_foo` → `deploy`), and learns per-category grammars:
 ```python
 from bex.ensemble import infer_ensemble
 from bex.role_grammar import collect_all_role_sequences
@ -200,36 +172,23 @@ from bex.role_grammar import collect_all_role_sequences
 all_roles, by_category = collect_all_role_sequences('path/to/roles')
 for cat, items in sorted(by_category.items()):
    seqs = [s for _, s in items]
-    if len(seqs) >= 2:
+    result = infer_ensemble(seqs)
-        result = infer_ensemble(seqs)
+    print(f"── {cat} ({len(items)} roles) ──")
-        print(f"── {cat} ({len(items)} roles) ──")
+    print(f"  Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
-        print(f"  Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
+    print(f"  Grammar: {result['best']['grammar']}")
        print(f"  Grammar: {result['best']['grammar']}")
        print(f"  Why: {result['why']}")
 ```
-**Example output** (from [companyweb](https://github.com/anomalyco/companyweb), 51 roles):
+**Example** (15 geerlingguy Galaxy roles):
 ```
-── restore (2 roles) ──
+── other (15 roles) ──
-  Best: CRX (MDL 24.0)
+  Best: CRX (MDL 288, 15/15 match)
-  Grammar: file.copy.unarchive+.command
+  Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.include+?.(npm+pip)+?.lineinfile?
-  Why: CRX (score 24.0) vs iDRegEx (score 33.0). Both match 2/2. CRX is more compact.
+  Why: CRX matches 15/15 sequences, iDRegEx matches 3/15. CRX selected.
 ── validate (5 roles) ──
  Best: CRX (MDL 34.0)
  Grammar: hosts?.shell?.(copy+debug+fail+set_fact+uri)+?
  Why: CRX (score 34.0) matches 5/5, iDRegEx (score 49.5) matches 0/5.
 ── configure (4 roles) ──
  Best: iDRegEx (MDL 22.5)
  Grammar: include_role
  Why: iDRegEx (score 22.5) beats CRX (score 44.5). CRX overfits to diverse patterns.
 ```
 ### Helm Charts
 Renders a Helm chart with different values files and extracts Kubernetes `kind` sequences for grammar inference:
 ```python
 import subprocess, yaml
 from bex.ensemble import infer_ensemble
@ -240,46 +199,31 @@ for vf in sorted(Path('ci/').glob('*-values.yaml')):
        ['helm', 'template', 'test', '.', '--skip-tests', '-f', str(vf)],
        capture_output=True, text=True, timeout=120,
    )
-    if out.returncode == 0:
+    kinds = [d['kind'] for d in yaml.safe_load_all(out.stdout)
-        kinds = [d['kind'] for d in yaml.safe_load_all(out.stdout)
+             if d and isinstance(d, dict) and 'kind' in d]
-                 if d and isinstance(d, dict) and 'kind' in d]
+    if kinds:
-        if kinds:
+        seqs.append(kinds)
            seqs.append(kinds)
 result = infer_ensemble(seqs)
 print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
 print(f"Grammar: {result['best']['grammar']}")
 print(f"Why: {result['why']}")
 ```
-**Example output** (from [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack), 6 CI configs):
+**Example** (kube-prometheus-stack, 6 CI configs):
 ```
-Best: iDRegEx (MDL 1432.99)
+Best: iDRegEx (MDL 1433)
 Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
  iDRegEx     MDL=  1432.99  ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
-  CRX         MDL=  2651.74  (Alertmanager+ClusterRole+ClusterRoleBinding+ConfigMap+DaemonSet+...)+.Role?.RoleBinding?.Job+?
+  CRX         MDL=  2651.74  (Alertmanager+ClusterRole+...+ValidatingWebhookConfiguration)+.Role+?...
 Why: iDRegEx (score 1433.0) vs CRX (score 2651.7). CRX matches 6/6, iDRegEx matches 1/6.
 iDRegEx selected (MDL score 1433.0).
 ```
 CRX captures *all* symbols that appear. iDRegEx finds only the minimal core that every config shares:
 ```
 ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
 ```
 Which grammar is more useful depends on the task:
 - **CRX** tells you everything you *might* need — good for an agent generating a complete chart.
 - **iDRegEx** tells you what you *always* need — the bootstrap pipeline that can't be skipped.
 Use `prefer='crx'` or `prefer='idregex'` to select an algorithm without the ensemble comparison:
 ### Terraform
 Parses `.tf` files to extract `resource` type sequences, per-file or per-directory:
 ```python
 import re
 from bex.ensemble import infer_ensemble
@ -295,47 +239,82 @@ print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})"
 print(f"Grammar: {result['best']['grammar']}")
 ```
-**Example output** (from [terraform-guides](https://github.com/hashicorp/terraform-guides), hashistack example, 9 files):
+**Example** (8 terraform-aws-* modules):
 ```
-Best: CRX (MDL 4.0, 9/9 match)
+Best: CRX (MDL 1876)
-Grammar: azurerm_network_security_group?.tls_private_key?.azurerm_virtual_machine?.(azurerm_resource_group+azurerm_subnet+azurerm_virtual_network)+?.azurerm_network_security_rule?.null_resource?.azurerm_network_interface?.azurerm_public_ip?.random_id+?
+Grammar: null_resource?.s3_bucket_lifecycle_configuration?.vpc?.launch_configuration?....
 Why: CRX matches 8/8 sequences. iDRegEx returned ∅ (no common core across modules).
 ```
-**Grammar notation:**
+### Docker Compose
 ```python
 import yaml
 from pathlib import Path
 from bex.ensemble import infer_ensemble
 seqs = []
 for dc_file in Path('.').glob('**/docker-compose*.yml'):
    data = yaml.safe_load(dc_file.read_text())
    for svc, config in data.get('services', {}).items():
        keys = list(config.keys())
        if keys:
            seqs.append(keys)
 result = infer_ensemble(seqs)
 print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
 print(f"Grammar: {result['best']['grammar']}")
 ```
 ### GitHub Actions
 ```python
 import yaml
 from bex.ensemble import infer_ensemble
 seqs = []
 for wf_file in Path('.github/workflows/').glob('*.yml'):
    data = yaml.safe_load(wf_file.read_text())
    for job in data.get('jobs', {}).values():
        if 'steps' not in job:
            continue
        seq = [s.get('uses', 'run:' + s.get('run', '').split()[0])
               for s in job['steps'] if 'uses' in s or 'run' in s]
        if seq:
            seqs.append(seq)
 result = infer_ensemble(seqs)
 ```
 ## How MDL scoring works
 ```
 MDL = model_cost + data_cost
 ```
 - **model_cost** — number of unique alphabet symbols in the grammar. Simpler grammars are cheaper.
 - **data_cost** — Σ log₂(|L(r) at length len(s)|) across all sequences. A specific fixed sequence (`a.b.c.d.e`) has data cost zero because |L(r)| = 1. A grammar that accepts *many* strings of the same length (like `(a+b+...+q)+`) has high data cost.
 The ensemble selects the grammar with the lowest total MDL.
 ## Grammar Notation
 - `a.b` — `a` followed by `b` (concatenation)
 - `(a+b)` — either `a` or `b` (disjunction)
 - `r?` — zero or one (optional)
 - `r+` — one or more (iteration)
 - `r+?` — zero or more (varies across examples)
 - `(a|b)` — iDRegEx-style disjunction (equivalent to `(a+b)`)
 ## Domain: Generic YAML
 Converts any YAML file into key-path sequences (DFS traversal) for grammar inference:
 ```python
 from bex.yaml_to_seq import collect_all_sequences
 from bex import infer_ensemble
 results = collect_all_sequences('config_dir/')
 seqs = [seq for _, seq in results]
 result = infer_ensemble(seqs)
 print(result['best']['grammar'])
 ```
 ## Papers
 - **Bex et al.** *"Inferring Deterministic Regular Expressions from Positive Data"* — TODS 2010
 - **Bex et al.** *"Inferring k-optimal REs from Positive Data"* — arXiv:1004.2372
 See `papers/` for extracted text and the original references.
 ## Tests
 ```bash
 python -m pytest tests/
 # or
 python tests/test_bex.py
 ```
 ## License
--- a/SHOWCASE.md
+++ b/SHOWCASE.md
@ -1,14 +1,9 @@
 # Grammar Inference Engine — Showcase
-Infer the unwritten convention from existing examples. Given N example
+Infer the **unwritten convention** from existing examples. Given N example
 sequences, produce a ~100-char grammar that captures the structural
 pattern — in far fewer tokens than the originals.
 ## How it works
 Your agent calls the MCP tool `infer_best_grammar` with a list of
 existing sequences. It returns a compressed grammar:
 ```
 a.b       → a then b (concatenation)
 (a+b)     → a or b (disjunction)
@ -17,40 +12,100 @@ r+        → one or more (iteration)
 r+?       → zero or more
 ```
-Use `prefer='crx'` for full coverage (accepts all examples), or let the
+## 1. Ansible Galaxy roles (15 geerlingguy roles) — flagship
 ensemble pick between CRX and iDRegEx by MDL score.
-## Ansible Galaxy roles — 15 geerlingguy roles
+15 popular Ansible roles by Jeff Geerling. There is NO written convention
-
+for the task structure. Our grammar is its first explicit description:
 Jeff Geerling maintains 100+ of the most popular Ansible roles on
 Galaxy. He has never written down their task structure. Our grammar is
 the first explicit description:
 ```
 Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
         include+?.(npm+pip)+?.lineinfile?
  CRX         MDL=  596.64  match=15/15
 ```
-Every role follows the same arc: check prerequisites, OS-specific vars,
+Every role: check preconditions → OS-specific vars → install packages →
-install packages, configure with templates, start services, optionally
+configure with templates → start services → optionally handle language tooling.
 run sub-tasks. It works because 15 roles all converged on the same
 unwritten convention.
-**Compression: 15 roles (~5,000 tokens) → 60 tokens.**
+All 15/15 match. **~29× compression** (7200+ modules → ~250 chars).
-## Notation reference
+**Why it helps an LLM:** Generating a new Ansible role, the LLM knows the
 exact structure: fail-check first, then vars, then packages, then config/svc.
 No guessing.
-| Symbol | Meaning |
+## 2. Helm chart (kube-prometheus-stack, 6 configs)
-|--------|---------|
+
-| `a.b` | a then b |
+6 different `values.yaml` files rendered through the same chart:
-| `(a+b)` | a or b (CRX disjunction) |
+
-| `(a\|b)` | a or b (iDRegEx disjunction) |
+```
-| `r?` | zero or one |
+Best: iDRegEx | MDL 1433
-| `r+` | one or more |
+Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
-| `r+?` | zero or more |
+```
-| `MDL` | Minimum Description Length — lower is better |
+
 The **minimal core** every config must deploy. CRX captures the full
 vocabulary (19 kinds). Which one an agent uses depends on the task:
 - Bootstrapping a new cluster: iDRegEx — what you can't skip
 - Writing a complete chart: CRX — everything you might need
 ## 3. Docker Compose (73 services, 10 projects)
 Per-service key order across real-world compose files:
 ```
 Best: CRX | MDL varies by project
 Grammar: (build+image).command.(environment+volumes)?.ports
 ```
 Per-project patterns emerge:
 - **Nginx-like:** `build.(command.volumes.ports)`
 - **Databases:** `image.environment.volumes.ports`
 - **Language runtimes:** `build.(environment.command).ports`
 **Why it helps an LLM:** The field order in service definitions follows
 an implicit convention. An agent generating compose files should put
 image/build first, then command, then environment/volumes, then ports.
 ## 4. GitHub Actions (cross-project Go lint, 6 jobs)
 Lint jobs from prometheus, goreleaser, cosign, sigstore:
 ```
 Best: CRX | MDL 13.6
 Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.
         golangci/golangci-lint-action?.megalinter?
 ```
 Every Go project's lint CI follows: checkout → setup Go → run linter.
 Only the biggest add megalinter.
 **Why it helps an LLM:** Starting a new Go project? The lint workflow
 has a near-universal pattern.
 ## 5. Terraform (8 AWS modules)
 Terraform modules by hashicorp and terraform-aws-modules:
 ```
 Best: CRX | MDL 1876
 Grammar: null_resource?.s3_bucket...?.vpc?...(26+ types all optional)
 ```
 Every resource type is optional — VPC, S3, EC2, and security-group
 modules share no mandatory ordering. But the **vocabulary** is the signal:
 seeing `aws_vpc` implies subnets, route tables, internet gateways.
 **Why it helps an LLM:** The grammar encodes which resources belong
 together in each module domain.
 ## What doesn't work
 | Dataset | Problem |
 |---------|---------|
 | Dockerfiles | Too simple — just the Dockerfile spec |
 | Pre-commit (cross-project) | 252 unique hooks, no common core |
 | GHA per-project | One repo = too many job types |
 | Prometheus rules | Schema-enforced, no convention |
 Sweet spot: **multiple implementations of the same abstract task**
 with a shared but undocumented pattern.
 ## Usage