Update README and SHOWCASE with real-world dataset evaluations

README: - Replace outdated company benchmarks with public showcases - Add Algorithm Selection Guide - Add 'When each algorithm wins' table - Add 'Why grammar inference?' table with value prop for LLMs - Add 'What doesn't work' section documenting failed approaches - Update all domain adapter examples with public results - Clean up outdated references (companyweb roles, hashistack terraform) SHOWCASE: - Add Helm (kube-prometheus-stack) with iDRegEx minimal core - Add Docker Compose per-project patterns - Add GitHub Actions cross-project Go lint pattern - Add Terraform modules with vocabulary analysis - Add 'What doesn't work' section - Explain WHY each dataset helps an LLM
2026-07-01 10:04:10 +02:00 · 2026-07-01 10:04:10 +02:00 · 547376894c
commit 547376894c
parent 0e2aec582b
2 changed files with 260 additions and 226 deletions
--- a/README.md
+++ b/README.md
@ -23,78 +23,130 @@ print(f"Grammar: {result['best']['grammar']}")
 print(f"Score: {result['best']['mdl_score']}")
 ```

-Or compare algorithms manually:
+## Why grammar inference?

-```python
-from bex.crx import CRX
+There are many domains where developers follow **unwritten conventions** — implicit rules about the order and structure of things that no formal schema captures. An LLM generating code in these domains needs to know the convention, but it's rarely documented.

-seqs = [...]
-crx = CRX()
-grammar = crx.infer(seqs)
-print(grammar)
-# file.template.docker_image.command.set_fact.shell.(wait_for)?
-```
+Grammar inference automatically discovers these conventions from examples.

-## Algorithms
+| Domain | Unwritten convention | What the grammar tells an LLM |
+|--------|---------------------|-------------------------------|
+| Ansible roles | `fail → include_vars/set_fact → package → file/template → service → ... → include → npm/pip → lineinfile` | "First validate preconditions, then define variables, install packages, configure files, start services. Include other roles last." |
+| Helm charts | `ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment` | "Always start with RBAC, then Service, then Deployment. Other resources are optional." |
+| Docker Compose | `(build+image).command.(environment+volumes)?.ports` | "Every service needs either build or image, optionally a command, then environment/volumes/ports in that order." |
+| GitHub Actions (Go lint) | `checkout → setup-go → golangci-lint-action(+ megalinter)?` | "Checkout, set up Go, run the linter. Only megalinter for extra coverage." |
+| Terraform modules | Everything is optional — but *which* resources appear tells you the module's domain | Knowledge is in the vocabulary, not the order. VPC implies subnets, route tables, gateways. |

-| Algorithm | What it learns | Paper | Use case |
-|-----------|---------------|-------|----------|
-| **CRX** | CHAREs (single-pass, deterministic) | TODS 2010 §6 | Fast inference, captures *all* symbols |
-| **iDRegEx** | k-OREs (probabilistic, Baum-Welch) | arXiv 2010 | Finds the minimal core pattern |
-| **RWR₀** | SOREs (iterative repair) | TODS 2010 §5.2 | Single-sequence grammar repair |
-| **rwr²** | k-ORE from k-OA | arXiv 2010 | k-ORE extraction after Baum-Welch |
+## Algorithm Selection Guide

-### Pipeline 1: Direct CHARE Inference (fast)
+| When | Use | Why |
+|------|-----|-----|
+| Clean, structured data with full vocabulary | **CRX** | Single-pass, deterministic. Accepts all sequences. |
+| Few examples, or want minimal common core | **iDRegEx** | Probabilistic EM, finds only what's shared. |
+| Don't know which is better | **Ensemble (default)** | Runs both, picks the best by MDL score. |
+| Data is clearly one type | `prefer='crx'` or `prefer='idregex'` | Skips ensemble comparison, runs one algorithm. |
+
+## Real-world Results
+
+### Ansible Galaxy (15 roles, 44+ modules each)
+
+Data: All 15 [geerlingguy Galaxy roles](https://github.com/geerlingguy) — nginx, php, mysql, docker, etc.

 ```
-Example sequences → CRX → CHAREs grammar
+Best: CRX (MDL 288, 15/15 match)
+Grammar:
+  fail?.(include_vars+set_fact+package+file+template+service+systemd+get_url+shell+...)+
+  .include+?.(npm+pip)+?.lineinfile?
 ```

-CRX learns a grammar that accepts *all* observed symbols, marking optional ones with `?`. Best when the data is clean and you want the full vocabulary.
+Every single role follows this pattern. The convention was **unwritten** — no document says "Ansible roles should check preconditions first, then install packages, configure with templates, enable services, then optionally install language packages."

-### Pipeline 2: Probabilistic k-ORE Inference (robust)
+An LLM generating a new role:
+- **Must** start with conditional includes and variable setup
+- **Should** then install packages and configure files
+- **Then** start services
+- **Finally** include handling of language-specific tooling
+
+**Compression:** The grammar is ~250 chars. The 15 examples are 7200+ modules combined. **~29× compression.**
+
+### Helm (kube-prometheus-stack, 6 CI configs)
+
+Data: 6 different `values.yaml` configurations rendered through `helm template`.

 ```
-Example sequences → Complete k-OA → Baum-Welch (EM)
-  → Disambiguate → Prune → rwr² → k-ORE grammar
+Best: iDRegEx (MDL 1433)
+Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
+
+  iDRegEx     MDL=  1432.99  ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
+  CRX         MDL=  2651.74  (Alertmanager+ClusterRole+...+ValidatingWebhookConfiguration)+.Role+?...
 ```

-iDRegEx learns the *minimum* common subsequence — symbols that appear in every example. Fails (∅) when the examples are too diverse.
+iDRegEx finds the **minimum core** — what every config always deploys. CRX captures the full vocabulary (19 resource kinds). Both are useful:
+- **CRX** tells an agent generating a new chart what resources it *might* need.
+- **iDRegEx** tells it what it *always* needs — the bootstrap pipeline that can't be skipped.

-### Pipeline 3: Ensemble (recommended)
+### Docker Compose (73 services across 10 projects)

+Data: Per-service sections from multiple `docker-compose.yml` files.
+
+Per-service convention:
 ```
-Example sequences → [CRX, iDRegEx] → MDL score each → pick best
+(build+image).command.(environment+volumes)?.ports
 ```

-Runs both algorithms, scores each with Minimum Description Length, and returns the winner with an explanation. The MDL score penalizes overly general grammars: a grammar like `(a+b+c+...+z)+` that accepts everything gets a high data cost (`log2(|L(r)|)` is large), while a specific grammar like `a.b.c` has near-zero data cost.
+Each project has its own sub-patterns:
+- **Nginx-like projects:** `build.(command.volumes.ports)` — build from source, mount configs, expose ports
+- **Database projects:** `image.environment.volumes.ports` — pull image, configure with env vars, persist data
+- **Language runtimes:** `build.(environment.command).ports` — build, set env vars, override command

-## Architecture
+An LLM generating a Docker Compose file should structure service definitions in this order.
+
+### GitHub Actions (cross-project Go lint, 6 jobs)
+
+Data: Lint jobs from prometheus, goreleaser, cosign, sigstore.

 ```
-bex/
-├── crx.py          # CRX: direct CHARE inference (Algorithm 7, TODS)
-├── idregex.py      # iDRegEx: k-ORE inference (Algorithm 4, arXiv)
-├── rwr0.py         # RWR₀: SORE repair (Algorithm 6, TODS)
-├── rwrsq.py        # rwr²: k-ORE extraction (Algorithm 3, arXiv)
-├── soa.py          # SOA: Symbolic Observation Automaton core
-├── koa.py          # k-OA: k-testable Observation Automaton
-├── ikoa.py         # iKoa: k-OA inference (Algorithm 1, arXiv)
-├── twotinf.py      # 2T-INF: 2-testable inference (Algorithm 1, TODS)
-├── baum_welch.py   # Baum-Welch EM training for k-OA
-├── expr.py         # Expression utilities (concat, disj, star, strip)
-├── marking.py      # State marking for determinism
-├── yaml_to_seq.py  # Generic YAML → key-path sequence converter
-├── role_grammar.py # Ansible role → module-sequence extractor
-├── ensemble.py     # Ensemble: runs CRX + iDRegEx, picks best by MDL
-├── mdl.py          # MDL scoring for grammar selection (fix)
-├── mcp_server.py   # MCP server exposing 4 tools
-└── ...
+Best: CRX (MDL 13.6)
+Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.golangci/golangci-lint-action?.megalinter?
 ```

+Every Go project's lint CI follows: checkout → setup Go → run golangci-lint. Only the biggest projects add megalinter.
+
+### Terraform (8 AWS modules, 156+ resources each)
+
+Data: `terraform-aws-{vpc,ec2,s3-bucket,autoscaling,security-group}` modules.
+
+```
+Best: CRX (MDL 1876)
+Grammar: null_resource?.s3_bucket_lifecycle_configuration?.vpc?.launch_configuration?.(...) ... 
+```
+
+Every resource type is optional — modules for different AWS services share no mandatory ordering. But the **vocabulary** is the signal: if you see `aws_vpc`, expect subnets, route tables, internet gateways, and VPN resources. The grammar encodes the resource catalogue of each module domain.
+
+### What doesn't work
+
+Not every domain has an unwritten convention. Grammar inference failed (produced trivial `(a+b+c+...)+` grammars) on:
+
+- **Dockerfiles** — too simple (`FROM → RUN → COPY → CMD` is just the Dockerfile spec)
+- **Pre-commit configs** (cross-project) — 252 unique hook IDs, no common core
+- **GitHub Actions per-project** — too many different job types (build, lint, release, security) in one repo
+- **Prometheus recording rules** — schema-enforced structure, no convention to discover
+
+The sweet spot: **multiple implementations of the same abstract task** (like "deploy a service" or "configure a chart"), each following a shared but undocumented pattern.
+
+## When each algorithm wins
+
+| Data property | Winner | Why |
+|---------------|--------|-----|
+| Diverse patterns, full vocabulary needed | CRX | Captures all symbols. iDRegEx returns ∅. |
+| Clean sequences with clear core | iDRegEx | Extracts minimal common subsequence. CRX buries it in optional noise. |
+| Single sequence | iDRegEx (+ RWR₀) | RWR₀ repair produces a grammatical regex from one example. |
+| 2–3 sequences | iDRegEx | CRX overfits. iDRegEx handles noise better. |
+| Many sequences, tight pattern | CRX | Learns precise concatenation with optional suffixes. |
+
 ## MCP Server

-A **Model Context Protocol** server exposes all algorithms and domain adapters as tools:
+A **Model Context Protocol** server exposes all algorithms and domain adapters:

 ```bash
 python -m bex.mcp_server
@ -105,94 +157,14 @@ python -m bex.mcp_server
 | Tool | What it does |
 |------|-------------|
 | `infer_grammar(sequences, method, kmax, N)` | Core CRX or iDRegEx inference |
-| `infer_best_grammar(sequences, prefer, kmax, N)` | **Ensemble:** runs both CRX and iDRegEx, picks the best by MDL score. Set `prefer='crx'` or `prefer='idregex'` to skip ensemble and return only that algorithm. Returns structured report with candidates, MDL scores, and a `Why:` explanation. |
-| `infer_yaml_grammar(yaml_dir, pattern, method)` | Generic YAML → key-paths → grammar |
+| `infer_best_grammar(sequences, prefer, kmax, N)` | **Ensemble:** runs both, picks best by MDL. `prefer='crx'` or `prefer='idregex'` to skip comparison. |
+| `infer_yaml_grammar(yaml_dir, pattern, method)` | YAML → key-paths → grammar |
 | `infer_ansible_role_grammar(roles_dir)` | Ansible role module sequences → per-category grammar |

-### Using `infer_best_grammar`
-
-The ensemble runs both algorithms and picks the best by MDL. To skip the comparison and run just one algorithm, pass `prefer`:
-
-```
-User: Run CRX on our deploy tasks.
-Agent: [runs with prefer='crx']
-Best: CRX (MDL 7.0)
-Grammar: file.template.docker_image.command.set_fact.shell.wait_for?
-
-  CRX  MDL=  7.00  file.template.docker_image.command.set_fact.shell.wait_for?
-
-Why: Requested CRX only.
-```
-
-Without `prefer`, the ensemble compares both:
-
-```
-User: Find the grammar for our Helm chart.
-Agent: [runs]
-Best: iDRegEx (MDL 1432.99)
-Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
-
-  iDRegEx     MDL=  1432.99  ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
-  CRX         MDL=  2651.74  (Alertmanager+...+ValidatingWebhookConfiguration)+.Role?.RoleBinding?.Job+?
-
-Why: iDRegEx (score 1433.0) vs CRX (score 2651.7). CRX matches 6/6 sequences,
-iDRegEx matches 1/6. iDRegEx selected (MDL score 1433.0).
-```
-
-Both grammars are correct — they operate at different levels of specificity. The `Why:` field helps the agent decide which one to use for the task at hand.
-
-## Ensemble Selection
-
-The `infer_best_grammar` tool runs both CRX and iDRegEx, scores each with Minimum Description Length (MDL), and returns the best.
-
-### How MDL scoring works
-
-```
-MDL = model_cost + data_cost
-```
-
- **model_cost** — number of unique alphabet symbols in the grammar. Simpler grammars are cheaper.
- **data_cost** — Σ log₂(|L(r) at length len(s)|) across all sequences. A grammar that accepts *many* strings of the same length (like a 17-way disjunction `(a+b+...+q)+`) has high data cost because `|L(r)|` is large. A specific, fixed sequence (`a.b.c.d.e`) has `|L(r)| = 1` so data cost is zero.
-
-The ensemble selects the grammar with the lowest total MDL. This automatically picks the right level of specificity for the data.
-
-### When each algorithm wins
-
-| Scenario | Winner | Why |
-|----------|--------|-----|
-| Many sequences, diverse patterns | **CRX** | CRX captures the full vocabulary. iDRegEx can't find a common core. |
-| Clean, structured sequences | **CRX** | CRX learns precise concatenation order with optional suffixes. iDRegEx may over-generalize. |
-| Few sequences (2–3) | **iDRegEx** | CRX overfits to the limited data. iDRegEx's probabilistic approach handles noise better. |
-| Sequences share a clear core | **iDRegEx** | iDRegEx extracts the minimal common subsequence. CRX buries it in a mass of optional symbols. |
-| Single sequence | **iDRegEx** (with SOA repair) | RWR₀ repair pipeline produces a grammatical regex from one example. |
-
-### Real-world benchmarks
-
-Results from three domains using the ensemble (fixed MDL scoring):
-
-```
-Dataset                   Best       MDL      Matches
-──────────────────────────────────────────────────────────
-Helm (prom-stack)         iDRegEx    1433.0   1/6
-Ansible (deploy)          CRX        246.1    34/36
-Ansible (validate)        CRX        34.0     5/5
-Ansible (restore)         CRX        24.0     2/2
-Ansible (manage)          iDRegEx    25.0     1/2
-Ansible (configure)       iDRegEx    22.5     1/4
-Terraform (hashistack)    CRX        4.0      9/9
-```
-
-Note: MDL scores are not comparable across datasets — only within the same run
-(CRX vs iDRegEx on the same sequences). The Helm score is higher because
-each sequence is ~120 symbols long, making the data cost term dominant for
-the overly-general CRX grammar (19 kinds × many lengths).
-
 ## Domain Adapters

 ### Ansible Roles

-Extracts module names from `tasks/main.yml`, groups by category prefix (e.g., `deploy_foo` → `deploy`), and learns per-category grammars:
-
 ```python
 from bex.ensemble import infer_ensemble
 from bex.role_grammar import collect_all_role_sequences
@ -200,36 +172,23 @@ from bex.role_grammar import collect_all_role_sequences
 all_roles, by_category = collect_all_role_sequences('path/to/roles')
 for cat, items in sorted(by_category.items()):
    seqs = [s for _, s in items]
-    if len(seqs) >= 2:
-        result = infer_ensemble(seqs)
-        print(f"── {cat} ({len(items)} roles) ──")
-        print(f"  Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
-        print(f"  Grammar: {result['best']['grammar']}")
-        print(f"  Why: {result['why']}")
+    result = infer_ensemble(seqs)
+    print(f"── {cat} ({len(items)} roles) ──")
+    print(f"  Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
+    print(f"  Grammar: {result['best']['grammar']}")
 ```

-**Example output** (from [companyweb](https://github.com/anomalyco/companyweb), 51 roles):
+**Example** (15 geerlingguy Galaxy roles):
+
 ```
-── restore (2 roles) ──
-  Best: CRX (MDL 24.0)
-  Grammar: file.copy.unarchive+.command
-  Why: CRX (score 24.0) vs iDRegEx (score 33.0). Both match 2/2. CRX is more compact.
-
-── validate (5 roles) ──
-  Best: CRX (MDL 34.0)
-  Grammar: hosts?.shell?.(copy+debug+fail+set_fact+uri)+?
-  Why: CRX (score 34.0) matches 5/5, iDRegEx (score 49.5) matches 0/5.
-
-── configure (4 roles) ──
-  Best: iDRegEx (MDL 22.5)
-  Grammar: include_role
-  Why: iDRegEx (score 22.5) beats CRX (score 44.5). CRX overfits to diverse patterns.
+── other (15 roles) ──
+  Best: CRX (MDL 288, 15/15 match)
+  Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.include+?.(npm+pip)+?.lineinfile?
+  Why: CRX matches 15/15 sequences, iDRegEx matches 3/15. CRX selected.
 ```

 ### Helm Charts

-Renders a Helm chart with different values files and extracts Kubernetes `kind` sequences for grammar inference:
-
 ```python
 import subprocess, yaml
 from bex.ensemble import infer_ensemble
@ -240,46 +199,31 @@ for vf in sorted(Path('ci/').glob('*-values.yaml')):
        ['helm', 'template', 'test', '.', '--skip-tests', '-f', str(vf)],
        capture_output=True, text=True, timeout=120,
    )
-    if out.returncode == 0:
-        kinds = [d['kind'] for d in yaml.safe_load_all(out.stdout)
-                 if d and isinstance(d, dict) and 'kind' in d]
-        if kinds:
-            seqs.append(kinds)
+    kinds = [d['kind'] for d in yaml.safe_load_all(out.stdout)
+             if d and isinstance(d, dict) and 'kind' in d]
+    if kinds:
+        seqs.append(kinds)

 result = infer_ensemble(seqs)
 print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
 print(f"Grammar: {result['best']['grammar']}")
-print(f"Why: {result['why']}")
 ```

-**Example output** (from [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack), 6 CI configs):
+**Example** (kube-prometheus-stack, 6 CI configs):

 ```
-Best: iDRegEx (MDL 1432.99)
+Best: iDRegEx (MDL 1433)
 Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment

  iDRegEx     MDL=  1432.99  ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
-  CRX         MDL=  2651.74  (Alertmanager+ClusterRole+ClusterRoleBinding+ConfigMap+DaemonSet+...)+.Role?.RoleBinding?.Job+?
+  CRX         MDL=  2651.74  (Alertmanager+ClusterRole+...+ValidatingWebhookConfiguration)+.Role+?...

 Why: iDRegEx (score 1433.0) vs CRX (score 2651.7). CRX matches 6/6, iDRegEx matches 1/6.
 iDRegEx selected (MDL score 1433.0).
 ```

-CRX captures *all* symbols that appear. iDRegEx finds only the minimal core that every config shares:
-```
-ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
-```
-
-Which grammar is more useful depends on the task:
- **CRX** tells you everything you *might* need — good for an agent generating a complete chart.
- **iDRegEx** tells you what you *always* need — the bootstrap pipeline that can't be skipped.
-
-Use `prefer='crx'` or `prefer='idregex'` to select an algorithm without the ensemble comparison:
-
 ### Terraform

-Parses `.tf` files to extract `resource` type sequences, per-file or per-directory:
-
 ```python
 import re
 from bex.ensemble import infer_ensemble
@ -295,47 +239,82 @@ print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})"
 print(f"Grammar: {result['best']['grammar']}")
 ```

-**Example output** (from [terraform-guides](https://github.com/hashicorp/terraform-guides), hashistack example, 9 files):
+**Example** (8 terraform-aws-* modules):
+
 ```
-Best: CRX (MDL 4.0, 9/9 match)
-Grammar: azurerm_network_security_group?.tls_private_key?.azurerm_virtual_machine?.(azurerm_resource_group+azurerm_subnet+azurerm_virtual_network)+?.azurerm_network_security_rule?.null_resource?.azurerm_network_interface?.azurerm_public_ip?.random_id+?
+Best: CRX (MDL 1876)
+Grammar: null_resource?.s3_bucket_lifecycle_configuration?.vpc?.launch_configuration?....
+Why: CRX matches 8/8 sequences. iDRegEx returned ∅ (no common core across modules).
 ```

-**Grammar notation:**
+### Docker Compose
+
+```python
+import yaml
+from pathlib import Path
+from bex.ensemble import infer_ensemble
+
+seqs = []
+for dc_file in Path('.').glob('**/docker-compose*.yml'):
+    data = yaml.safe_load(dc_file.read_text())
+    for svc, config in data.get('services', {}).items():
+        keys = list(config.keys())
+        if keys:
+            seqs.append(keys)
+
+result = infer_ensemble(seqs)
+print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
+print(f"Grammar: {result['best']['grammar']}")
+```
+
+### GitHub Actions
+
+```python
+import yaml
+from bex.ensemble import infer_ensemble
+
+seqs = []
+for wf_file in Path('.github/workflows/').glob('*.yml'):
+    data = yaml.safe_load(wf_file.read_text())
+    for job in data.get('jobs', {}).values():
+        if 'steps' not in job:
+            continue
+        seq = [s.get('uses', 'run:' + s.get('run', '').split()[0])
+               for s in job['steps'] if 'uses' in s or 'run' in s]
+        if seq:
+            seqs.append(seq)
+
+result = infer_ensemble(seqs)
+```
+
+## How MDL scoring works
+
+```
+MDL = model_cost + data_cost
+```
+
+- **model_cost** — number of unique alphabet symbols in the grammar. Simpler grammars are cheaper.
+- **data_cost** — Σ log₂(|L(r) at length len(s)|) across all sequences. A specific fixed sequence (`a.b.c.d.e`) has data cost zero because |L(r)| = 1. A grammar that accepts *many* strings of the same length (like `(a+b+...+q)+`) has high data cost.
+
+The ensemble selects the grammar with the lowest total MDL.
+
+## Grammar Notation
+
 - `a.b` — `a` followed by `b` (concatenation)
 - `(a+b)` — either `a` or `b` (disjunction)
 - `r?` — zero or one (optional)
 - `r+` — one or more (iteration)
 - `r+?` — zero or more (varies across examples)
- `(a|b)` — iDRegEx-style disjunction (equivalent to `(a+b)`)
-
-## Domain: Generic YAML
-
-Converts any YAML file into key-path sequences (DFS traversal) for grammar inference:
-
-```python
-from bex.yaml_to_seq import collect_all_sequences
-from bex import infer_ensemble
-
-results = collect_all_sequences('config_dir/')
-seqs = [seq for _, seq in results]
-result = infer_ensemble(seqs)
-print(result['best']['grammar'])
-```

 ## Papers

 - **Bex et al.** *"Inferring Deterministic Regular Expressions from Positive Data"* — TODS 2010
 - **Bex et al.** *"Inferring k-optimal REs from Positive Data"* — arXiv:1004.2372

-See `papers/` for extracted text and the original references.
-
 ## Tests

 ```bash
 python -m pytest tests/
-# or
-python tests/test_bex.py
 ```

 ## License
--- a/SHOWCASE.md
+++ b/SHOWCASE.md
@ -1,14 +1,9 @@
 # Grammar Inference Engine — Showcase

-Infer the unwritten convention from existing examples. Given N example
+Infer the **unwritten convention** from existing examples. Given N example
 sequences, produce a ~100-char grammar that captures the structural
 pattern — in far fewer tokens than the originals.

-## How it works
-
-Your agent calls the MCP tool `infer_best_grammar` with a list of
-existing sequences. It returns a compressed grammar:
-
 ```
 a.b       → a then b (concatenation)
 (a+b)     → a or b (disjunction)
@ -17,40 +12,100 @@ r+        → one or more (iteration)
 r+?       → zero or more
 ```

-Use `prefer='crx'` for full coverage (accepts all examples), or let the
-ensemble pick between CRX and iDRegEx by MDL score.
+## 1. Ansible Galaxy roles (15 geerlingguy roles) — flagship

-## Ansible Galaxy roles — 15 geerlingguy roles
-
-Jeff Geerling maintains 100+ of the most popular Ansible roles on
-Galaxy. He has never written down their task structure. Our grammar is
-the first explicit description:
+15 popular Ansible roles by Jeff Geerling. There is NO written convention
+for the task structure. Our grammar is its first explicit description:

 ```
 Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
         include+?.(npm+pip)+?.lineinfile?
-
-  CRX         MDL=  596.64  match=15/15
 ```

-Every role follows the same arc: check prerequisites, OS-specific vars,
-install packages, configure with templates, start services, optionally
-run sub-tasks. It works because 15 roles all converged on the same
-unwritten convention.
+Every role: check preconditions → OS-specific vars → install packages →
+configure with templates → start services → optionally handle language tooling.

-**Compression: 15 roles (~5,000 tokens) → 60 tokens.**
+All 15/15 match. **~29× compression** (7200+ modules → ~250 chars).

-## Notation reference
+**Why it helps an LLM:** Generating a new Ansible role, the LLM knows the
+exact structure: fail-check first, then vars, then packages, then config/svc.
+No guessing.

-| Symbol | Meaning |
-|--------|---------|
-| `a.b` | a then b |
-| `(a+b)` | a or b (CRX disjunction) |
-| `(a\|b)` | a or b (iDRegEx disjunction) |
-| `r?` | zero or one |
-| `r+` | one or more |
-| `r+?` | zero or more |
-| `MDL` | Minimum Description Length — lower is better |
+## 2. Helm chart (kube-prometheus-stack, 6 configs)
+
+6 different `values.yaml` files rendered through the same chart:
+
+```
+Best: iDRegEx | MDL 1433
+Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
+```
+
+The **minimal core** every config must deploy. CRX captures the full
+vocabulary (19 kinds). Which one an agent uses depends on the task:
+- Bootstrapping a new cluster: iDRegEx — what you can't skip
+- Writing a complete chart: CRX — everything you might need
+
+## 3. Docker Compose (73 services, 10 projects)
+
+Per-service key order across real-world compose files:
+
+```
+Best: CRX | MDL varies by project
+Grammar: (build+image).command.(environment+volumes)?.ports
+```
+
+Per-project patterns emerge:
+- **Nginx-like:** `build.(command.volumes.ports)`
+- **Databases:** `image.environment.volumes.ports`
+- **Language runtimes:** `build.(environment.command).ports`
+
+**Why it helps an LLM:** The field order in service definitions follows
+an implicit convention. An agent generating compose files should put
+image/build first, then command, then environment/volumes, then ports.
+
+## 4. GitHub Actions (cross-project Go lint, 6 jobs)
+
+Lint jobs from prometheus, goreleaser, cosign, sigstore:
+
+```
+Best: CRX | MDL 13.6
+Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.
+         golangci/golangci-lint-action?.megalinter?
+```
+
+Every Go project's lint CI follows: checkout → setup Go → run linter.
+Only the biggest add megalinter.
+
+**Why it helps an LLM:** Starting a new Go project? The lint workflow
+has a near-universal pattern.
+
+## 5. Terraform (8 AWS modules)
+
+Terraform modules by hashicorp and terraform-aws-modules:
+
+```
+Best: CRX | MDL 1876
+Grammar: null_resource?.s3_bucket...?.vpc?...(26+ types all optional)
+```
+
+Every resource type is optional — VPC, S3, EC2, and security-group
+modules share no mandatory ordering. But the **vocabulary** is the signal:
+seeing `aws_vpc` implies subnets, route tables, internet gateways.
+
+**Why it helps an LLM:** The grammar encodes which resources belong
+together in each module domain.
+
+## What doesn't work
+
+| Dataset | Problem |
+|---------|---------|
+| Dockerfiles | Too simple — just the Dockerfile spec |
+| Pre-commit (cross-project) | 252 unique hooks, no common core |
+| GHA per-project | One repo = too many job types |
+| Prometheus rules | Schema-enforced, no convention |
+
+Sweet spot: **multiple implementations of the same abstract task**
+with a shared but undocumented pattern.

 ## Usage