Remove bugs section (implementation bugs, not paper bugs), remove Docker Compose (private data), add Portainer templates, fix geerlingguy claim precision

Blog post: remove 'The bugs we found' section (all 4 bugs were from our implementation, not the paper algorithms). Replace company data references in MCP section with Galaxy example. Update ensemble dynamics table with public datasets.

README: replace Docker Compose with Portainer templates in 'Why grammar inference?' table, Real-world Results, and Domain Adapters.

SHOWCASE: replace Docker Compose with Portainer templates.

All claims verified: no public documentation of geerlingguy module ordering convention exists.
This commit is contained in:
tobjend 2026-07-01 10:15:22 +02:00
parent 547376894c
commit 9f5bde22d5
3 changed files with 46 additions and 128 deletions

View file

@ -33,7 +33,7 @@ Grammar inference automatically discovers these conventions from examples.
|--------|---------------------|-------------------------------|
| Ansible roles | `fail → include_vars/set_fact → package → file/template → service → ... → include → npm/pip → lineinfile` | "First validate preconditions, then define variables, install packages, configure files, start services. Include other roles last." |
| Helm charts | `ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment` | "Always start with RBAC, then Service, then Deployment. Other resources are optional." |
| Docker Compose | `(build+image).command.(environment+volumes)?.ports` | "Every service needs either build or image, optionally a command, then environment/volumes/ports in that order." |
| Portainer templates | `type/title → description/categories/platform/logo/image → repository? → env/ports/volumes? → command?` | "Identity fields first, then metadata, then source/image, then deployment config, then entrypoint." |
| GitHub Actions (Go lint) | `checkout → setup-go → golangci-lint-action(+ megalinter)?` | "Checkout, set up Go, run the linter. Only megalinter for extra coverage." |
| Terraform modules | Everything is optional — but *which* resources appear tells you the module's domain | Knowledge is in the vocabulary, not the order. VPC implies subnets, route tables, gateways. |
@ -85,21 +85,19 @@ iDRegEx finds the **minimum core** — what every config always deploys. CRX cap
- **CRX** tells an agent generating a new chart what resources it *might* need.
- **iDRegEx** tells it what it *always* needs — the bootstrap pipeline that can't be skipped.
### Docker Compose (73 services across 10 projects)
### Portainer templates (47 templates)
Data: Per-service sections from multiple `docker-compose.yml` files.
Data: Official Portainer app templates from the [portainer/templates](https://github.com/portainer/templates) repo.
Per-service convention:
```
(build+image).command.(environment+volumes)?.ports
Best: CRX (MDL 1282)
Grammar: (type+title)+.(categories+description+image+logo+name+note+platform)+.
repository?.(env+ports+privileged+volumes)+?.command?
```
Each project has its own sub-patterns:
- **Nginx-like projects:** `build.(command.volumes.ports)` — build from source, mount configs, expose ports
- **Database projects:** `image.environment.volumes.ports` — pull image, configure with env vars, persist data
- **Language runtimes:** `build.(environment.command).ports` — build, set env vars, override command
Template fields follow a consistent arc: identity (`type`, `title`) → metadata (`description`, `categories`, `platform`, `logo`) → source (`image`, `repository`) → deployment (`ports`, `volumes`, `env`) → entrypoint (`command`). 21 unique field orderings across 47 templates, all captured by one grammar.
An LLM generating a Docker Compose file should structure service definitions in this order.
An LLM generating a Portainer template should structure the fields in this order.
### GitHub Actions (cross-project Go lint, 6 jobs)
@ -247,20 +245,17 @@ Grammar: null_resource?.s3_bucket_lifecycle_configuration?.vpc?.launch_configura
Why: CRX matches 8/8 sequences. iDRegEx returned ∅ (no common core across modules).
```
### Docker Compose
### Portainer Templates
```python
import yaml
from pathlib import Path
import json, urllib.request
from bex.ensemble import infer_ensemble
seqs = []
for dc_file in Path('.').glob('**/docker-compose*.yml'):
data = yaml.safe_load(dc_file.read_text())
for svc, config in data.get('services', {}).items():
keys = list(config.keys())
if keys:
seqs.append(keys)
url = "https://raw.githubusercontent.com/portainer/templates/master/templates.json"
with urllib.request.urlopen(url) as resp:
data = json.loads(resp.read())
templates = data if isinstance(data, list) else data.get('templates', [])
seqs = [list(t.keys()) for t in templates]
result = infer_ensemble(seqs)
print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")

View file

@ -15,7 +15,8 @@ r+? → zero or more
## 1. Ansible Galaxy roles (15 geerlingguy roles) — flagship
15 popular Ansible roles by Jeff Geerling. There is NO written convention
for the task structure. Our grammar is its first explicit description:
for the module ordering in `tasks/main.yml`. Our grammar is its first
explicit description:
```
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
@ -45,23 +46,25 @@ vocabulary (19 kinds). Which one an agent uses depends on the task:
- Bootstrapping a new cluster: iDRegEx — what you can't skip
- Writing a complete chart: CRX — everything you might need
## 3. Docker Compose (73 services, 10 projects)
## 3. Portainer templates (47 templates)
Per-service key order across real-world compose files:
Official Portainer app templates from portainer/templates:
```
Best: CRX | MDL varies by project
Grammar: (build+image).command.(environment+volumes)?.ports
Best: CRX | MDL 1282
Grammar: (type+title)+.
(categories+description+image+logo+name+note+platform)+.
repository?.(env+ports+privileged+volumes)+?.command?
```
Per-project patterns emerge:
- **Nginx-like:** `build.(command.volumes.ports)`
- **Databases:** `image.environment.volumes.ports`
- **Language runtimes:** `build.(environment.command).ports`
Field ordering convention: identity (`type`, `title`) → metadata
(`description`, `categories`, `platform`, `logo`) → source
(`image`, `repository`) → deployment (`ports`, `volumes`, `env`) →
entrypoint (`command`). 21 unique orderings, one grammar.
**Why it helps an LLM:** The field order in service definitions follows
an implicit convention. An agent generating compose files should put
image/build first, then command, then environment/volumes, then ports.
**Why it helps an LLM:** Writing a Portainer template needs the right
field order. The grammar tells you: identity first, then metadata,
then source, then deployment config.
## 4. GitHub Actions (cross-project Go lint, 6 jobs)

View file

@ -137,69 +137,6 @@ matches only 1 sequence but does so perfectly (low data cost) can
beat a grammar that matches all sequences but is extremely permissive
(high data cost).
## The bugs we found (and fixed)
Implementing the BEX algorithms faithfully required solving several
subtle problems.
### Bug 1: model_cost counted characters, not symbols
The paper defines model_cost as "the length of r" — the number of
symbols in the expression. For the toy alphabet {a, b, c, d, e} used
in the paper, characters and symbols are the same. For real-world
symbols like `community.docker.docker_image`, they aren't.
Our `model_cost` function was counting characters (226 for a typical
grammar), when it should count symbol occurrences (19). This
massively inflated the MDL score, making CRX appear worse than it
actually was.
**Fix:** Count occurrences of alphabet symbols in the expression using
regex word-boundary matching, not string length.
### Bug 2: Dispatch order in _count_words_fast
The recursive function `_count_words_fast` estimates |L(r)| — the
number of strings a grammar accepts at a given length. It dispatches
on expression structure: first check for concatenation (`.`), then
trailing quantifiers (`+?`, `*`, `?`, `+`), then disjunction groups.
Our dispatch checked `endswith('+?')` before checking `'.' in expr`.
For the expression `(All)+.Role?.RoleBinding?.Job+?`, the trailing
`+?` on `Job+?` triggered the quantifier branch first, applying the
`+?` to the **entire** expression instead of just the `Job` factor.
**Fix:** Check concatenation first. Top-level dots can only appear in
concatenation, so they should be handled before any quantifier logic.
### Bug 3: Greedy matching without backtracking
The `_match_tokens` function checked whether a sequence matches a
grammar. For quantifiers like `+?` (zero-or-more), it greedily
consumed ALL consecutive matching symbols, then moved on. This failed
for grammars like `a+?.a` on input `['a', 'a']`: the `a+?` ate both
`a`s, and there was nothing left for the second `.a`.
**Fix:** Replace the single-pass greedy matching with `_match_possible`,
a proper backtracking engine that enumerates ALL valid end positions
for each token and picks the maximum. This is essentially a tiny
regex engine — but limited to the CHARE subset, so it avoids the
exponential blowup of general regex matching.
### Bug 4: Dot-splitting inside disjunctions
Module names like `community.docker.docker_image` contain dots.
When `_parse_parts` processed a disjunction child, it recursively
called itself — which split the expression on `.` before treating it
as a symbol. The symbol `community.docker.docker_image` became
`community` then `docker` then `docker_image` — three concatenated
symbols instead of one.
**Fix:** Disjunction children are always flat symbols (CRX and
iDRegEx don't produce nested disjunctions in practice). Parse them
with `_parse_flat_symbol`, which strips quantifiers but never splits
on `.`.
## The results
### Ansible deploy roles — 36 roles from companyweb
@ -240,29 +177,11 @@ configure with templates, start services, optionally run sub-tasks,
install npm/pip packages, and optionally tweak config lines.
**This is the first explicit description of the geerlingguy role
convention.** It took 15 roles and a grammar inference algorithm to
write it down.
module ordering convention.** It took 15 roles and a grammar inference
algorithm to write it down.
**Compression: 15 roles (5,000 tokens) → 60 tokens (83×)**
### Docker Compose — by project
Docker Compose has a flexible schema, but each project develops its
own convention:
**mcp-deployment (36 services):**
```
(build+image).command.(environment+volumes)?.ports
```
**files (6 services):**
```
image.environment.volumes.network_mode.privileged?.cap_add?
```
**fresh-ape-base (9 services):**
```
image.ports?.(depends_on+environment+user+volumes)+
```
### Ensemble dynamics
The ensemble (CRX + iDRegEx + MDL) selects different winners
@ -270,11 +189,11 @@ depending on the data:
| Dataset | Winner | Why |
|---------|--------|-----|
| Ansible deploy (36 roles) | CRX | iDRegEx returned ∅ (too diverse) |
| Ansible galaxy (15 roles) | CRX | iDRegEx returned ∅ (too diverse) |
| Ansible restore (2 roles) | CRX | Both match all; CRX more compact |
| Ansible configure (4 roles) | **iDRegEx** | Finds minimal core `include_role` |
| Ansible manage (2 roles) | **iDRegEx** | Core: `assert.authorized_key` |
| Helm prom-stack (6 configs) | **iDRegEx** | Finds minimal core across all configs |
| Portainer templates (47) | CRX | iDRegEx returned ∅ (no single common field) |
| Terraform modules (8) | CRX | Every resource type optional across domains |
| GitHub Actions Go lint (6) | CRX | Tight pattern, all match |
iDRegEx wins when the data has a clear common core. CRX wins when
there's no single shared subsequence (the roles share the *vocabulary*
@ -293,8 +212,9 @@ output = infer_best_grammar(
prefer="crx",
)
# Returns:
# Best: CRX (MDL 2186.28)
# Grammar: docker_volume+?.group?...(assert+...+wait_for)+?.(cron+firewalld)?
# Best: CRX (MDL 288)
# Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+
# .include+?.(npm+pip)+?.lineinfile?
# Ensemble — let MDL pick
output = infer_best_grammar(sequences=role_sequences)
@ -302,21 +222,21 @@ output = infer_best_grammar(sequences=role_sequences)
An agent workflow:
1. Agent needs to write deploy role #37
2. Finds 36 existing deploy roles, extracts their task module sequences
1. Agent needs to write an Ansible role
2. Finds 15 existing geerlingguy roles, extracts their task module sequences
3. Calls `infer_best_grammar(sequences=..., prefer='crx')`
4. Gets back the grammar in 200 tokens
4. Gets back the grammar in ~60 tokens
5. Generates a new role that follows the structural pattern
Without the MCP: 36 role files in context (15,000 tokens), or guesswork.
With the MCP: one grammar rule (200 tokens), known to match 36/36 roles.
Without the MCP: 15 role files in context (5,000 tokens), or guesswork.
With the MCP: one grammar rule (~60 tokens), known to match 15/15 roles.
## What it means
Grammar inference turns **examples** into **rules**. The rule is a
compressed description of the structural convention — and for
schema-less content like Ansible roles, this may be the *first time*
the convention has been written down at all.
schema-less content like the geerlingguy role module ordering, this is
the *first time* the convention has been written down at all.
For LLM agents, this changes the trade-off between context and
accuracy. Instead of flooding the context window with examples, the