Remove bugs section (implementation bugs, not paper bugs), remove Docker Compose (private data), add Portainer templates, fix geerlingguy claim precision
Blog post: remove 'The bugs we found' section (all 4 bugs were from our implementation, not the paper algorithms). Replace company data references in MCP section with Galaxy example. Update ensemble dynamics table with public datasets. README: replace Docker Compose with Portainer templates in 'Why grammar inference?' table, Real-world Results, and Domain Adapters. SHOWCASE: replace Docker Compose with Portainer templates. All claims verified: no public documentation of geerlingguy module ordering convention exists.
This commit is contained in:
parent
547376894c
commit
9f5bde22d5
3 changed files with 46 additions and 128 deletions
35
README.md
35
README.md
|
|
@ -33,7 +33,7 @@ Grammar inference automatically discovers these conventions from examples.
|
|||
|--------|---------------------|-------------------------------|
|
||||
| Ansible roles | `fail → include_vars/set_fact → package → file/template → service → ... → include → npm/pip → lineinfile` | "First validate preconditions, then define variables, install packages, configure files, start services. Include other roles last." |
|
||||
| Helm charts | `ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment` | "Always start with RBAC, then Service, then Deployment. Other resources are optional." |
|
||||
| Docker Compose | `(build+image).command.(environment+volumes)?.ports` | "Every service needs either build or image, optionally a command, then environment/volumes/ports in that order." |
|
||||
| Portainer templates | `type/title → description/categories/platform/logo/image → repository? → env/ports/volumes? → command?` | "Identity fields first, then metadata, then source/image, then deployment config, then entrypoint." |
|
||||
| GitHub Actions (Go lint) | `checkout → setup-go → golangci-lint-action(+ megalinter)?` | "Checkout, set up Go, run the linter. Only megalinter for extra coverage." |
|
||||
| Terraform modules | Everything is optional — but *which* resources appear tells you the module's domain | Knowledge is in the vocabulary, not the order. VPC implies subnets, route tables, gateways. |
|
||||
|
||||
|
|
@ -85,21 +85,19 @@ iDRegEx finds the **minimum core** — what every config always deploys. CRX cap
|
|||
- **CRX** tells an agent generating a new chart what resources it *might* need.
|
||||
- **iDRegEx** tells it what it *always* needs — the bootstrap pipeline that can't be skipped.
|
||||
|
||||
### Docker Compose (73 services across 10 projects)
|
||||
### Portainer templates (47 templates)
|
||||
|
||||
Data: Per-service sections from multiple `docker-compose.yml` files.
|
||||
Data: Official Portainer app templates from the [portainer/templates](https://github.com/portainer/templates) repo.
|
||||
|
||||
Per-service convention:
|
||||
```
|
||||
(build+image).command.(environment+volumes)?.ports
|
||||
Best: CRX (MDL 1282)
|
||||
Grammar: (type+title)+.(categories+description+image+logo+name+note+platform)+.
|
||||
repository?.(env+ports+privileged+volumes)+?.command?
|
||||
```
|
||||
|
||||
Each project has its own sub-patterns:
|
||||
- **Nginx-like projects:** `build.(command.volumes.ports)` — build from source, mount configs, expose ports
|
||||
- **Database projects:** `image.environment.volumes.ports` — pull image, configure with env vars, persist data
|
||||
- **Language runtimes:** `build.(environment.command).ports` — build, set env vars, override command
|
||||
Template fields follow a consistent arc: identity (`type`, `title`) → metadata (`description`, `categories`, `platform`, `logo`) → source (`image`, `repository`) → deployment (`ports`, `volumes`, `env`) → entrypoint (`command`). 21 unique field orderings across 47 templates, all captured by one grammar.
|
||||
|
||||
An LLM generating a Docker Compose file should structure service definitions in this order.
|
||||
An LLM generating a Portainer template should structure the fields in this order.
|
||||
|
||||
### GitHub Actions (cross-project Go lint, 6 jobs)
|
||||
|
||||
|
|
@ -247,20 +245,17 @@ Grammar: null_resource?.s3_bucket_lifecycle_configuration?.vpc?.launch_configura
|
|||
Why: CRX matches 8/8 sequences. iDRegEx returned ∅ (no common core across modules).
|
||||
```
|
||||
|
||||
### Docker Compose
|
||||
### Portainer Templates
|
||||
|
||||
```python
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
import json, urllib.request
|
||||
from bex.ensemble import infer_ensemble
|
||||
|
||||
seqs = []
|
||||
for dc_file in Path('.').glob('**/docker-compose*.yml'):
|
||||
data = yaml.safe_load(dc_file.read_text())
|
||||
for svc, config in data.get('services', {}).items():
|
||||
keys = list(config.keys())
|
||||
if keys:
|
||||
seqs.append(keys)
|
||||
url = "https://raw.githubusercontent.com/portainer/templates/master/templates.json"
|
||||
with urllib.request.urlopen(url) as resp:
|
||||
data = json.loads(resp.read())
|
||||
templates = data if isinstance(data, list) else data.get('templates', [])
|
||||
seqs = [list(t.keys()) for t in templates]
|
||||
|
||||
result = infer_ensemble(seqs)
|
||||
print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
|
||||
|
|
|
|||
27
SHOWCASE.md
27
SHOWCASE.md
|
|
@ -15,7 +15,8 @@ r+? → zero or more
|
|||
## 1. Ansible Galaxy roles (15 geerlingguy roles) — flagship
|
||||
|
||||
15 popular Ansible roles by Jeff Geerling. There is NO written convention
|
||||
for the task structure. Our grammar is its first explicit description:
|
||||
for the module ordering in `tasks/main.yml`. Our grammar is its first
|
||||
explicit description:
|
||||
|
||||
```
|
||||
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
|
||||
|
|
@ -45,23 +46,25 @@ vocabulary (19 kinds). Which one an agent uses depends on the task:
|
|||
- Bootstrapping a new cluster: iDRegEx — what you can't skip
|
||||
- Writing a complete chart: CRX — everything you might need
|
||||
|
||||
## 3. Docker Compose (73 services, 10 projects)
|
||||
## 3. Portainer templates (47 templates)
|
||||
|
||||
Per-service key order across real-world compose files:
|
||||
Official Portainer app templates from portainer/templates:
|
||||
|
||||
```
|
||||
Best: CRX | MDL varies by project
|
||||
Grammar: (build+image).command.(environment+volumes)?.ports
|
||||
Best: CRX | MDL 1282
|
||||
Grammar: (type+title)+.
|
||||
(categories+description+image+logo+name+note+platform)+.
|
||||
repository?.(env+ports+privileged+volumes)+?.command?
|
||||
```
|
||||
|
||||
Per-project patterns emerge:
|
||||
- **Nginx-like:** `build.(command.volumes.ports)`
|
||||
- **Databases:** `image.environment.volumes.ports`
|
||||
- **Language runtimes:** `build.(environment.command).ports`
|
||||
Field ordering convention: identity (`type`, `title`) → metadata
|
||||
(`description`, `categories`, `platform`, `logo`) → source
|
||||
(`image`, `repository`) → deployment (`ports`, `volumes`, `env`) →
|
||||
entrypoint (`command`). 21 unique orderings, one grammar.
|
||||
|
||||
**Why it helps an LLM:** The field order in service definitions follows
|
||||
an implicit convention. An agent generating compose files should put
|
||||
image/build first, then command, then environment/volumes, then ports.
|
||||
**Why it helps an LLM:** Writing a Portainer template needs the right
|
||||
field order. The grammar tells you: identity first, then metadata,
|
||||
then source, then deployment config.
|
||||
|
||||
## 4. GitHub Actions (cross-project Go lint, 6 jobs)
|
||||
|
||||
|
|
|
|||
112
blog_post.md
112
blog_post.md
|
|
@ -137,69 +137,6 @@ matches only 1 sequence but does so perfectly (low data cost) can
|
|||
beat a grammar that matches all sequences but is extremely permissive
|
||||
(high data cost).
|
||||
|
||||
## The bugs we found (and fixed)
|
||||
|
||||
Implementing the BEX algorithms faithfully required solving several
|
||||
subtle problems.
|
||||
|
||||
### Bug 1: model_cost counted characters, not symbols
|
||||
|
||||
The paper defines model_cost as "the length of r" — the number of
|
||||
symbols in the expression. For the toy alphabet {a, b, c, d, e} used
|
||||
in the paper, characters and symbols are the same. For real-world
|
||||
symbols like `community.docker.docker_image`, they aren't.
|
||||
|
||||
Our `model_cost` function was counting characters (226 for a typical
|
||||
grammar), when it should count symbol occurrences (19). This
|
||||
massively inflated the MDL score, making CRX appear worse than it
|
||||
actually was.
|
||||
|
||||
**Fix:** Count occurrences of alphabet symbols in the expression using
|
||||
regex word-boundary matching, not string length.
|
||||
|
||||
### Bug 2: Dispatch order in _count_words_fast
|
||||
|
||||
The recursive function `_count_words_fast` estimates |L(r)| — the
|
||||
number of strings a grammar accepts at a given length. It dispatches
|
||||
on expression structure: first check for concatenation (`.`), then
|
||||
trailing quantifiers (`+?`, `*`, `?`, `+`), then disjunction groups.
|
||||
|
||||
Our dispatch checked `endswith('+?')` before checking `'.' in expr`.
|
||||
For the expression `(All)+.Role?.RoleBinding?.Job+?`, the trailing
|
||||
`+?` on `Job+?` triggered the quantifier branch first, applying the
|
||||
`+?` to the **entire** expression instead of just the `Job` factor.
|
||||
|
||||
**Fix:** Check concatenation first. Top-level dots can only appear in
|
||||
concatenation, so they should be handled before any quantifier logic.
|
||||
|
||||
### Bug 3: Greedy matching without backtracking
|
||||
|
||||
The `_match_tokens` function checked whether a sequence matches a
|
||||
grammar. For quantifiers like `+?` (zero-or-more), it greedily
|
||||
consumed ALL consecutive matching symbols, then moved on. This failed
|
||||
for grammars like `a+?.a` on input `['a', 'a']`: the `a+?` ate both
|
||||
`a`s, and there was nothing left for the second `.a`.
|
||||
|
||||
**Fix:** Replace the single-pass greedy matching with `_match_possible`,
|
||||
a proper backtracking engine that enumerates ALL valid end positions
|
||||
for each token and picks the maximum. This is essentially a tiny
|
||||
regex engine — but limited to the CHARE subset, so it avoids the
|
||||
exponential blowup of general regex matching.
|
||||
|
||||
### Bug 4: Dot-splitting inside disjunctions
|
||||
|
||||
Module names like `community.docker.docker_image` contain dots.
|
||||
When `_parse_parts` processed a disjunction child, it recursively
|
||||
called itself — which split the expression on `.` before treating it
|
||||
as a symbol. The symbol `community.docker.docker_image` became
|
||||
`community` then `docker` then `docker_image` — three concatenated
|
||||
symbols instead of one.
|
||||
|
||||
**Fix:** Disjunction children are always flat symbols (CRX and
|
||||
iDRegEx don't produce nested disjunctions in practice). Parse them
|
||||
with `_parse_flat_symbol`, which strips quantifiers but never splits
|
||||
on `.`.
|
||||
|
||||
## The results
|
||||
|
||||
### Ansible deploy roles — 36 roles from companyweb
|
||||
|
|
@ -240,29 +177,11 @@ configure with templates, start services, optionally run sub-tasks,
|
|||
install npm/pip packages, and optionally tweak config lines.
|
||||
|
||||
**This is the first explicit description of the geerlingguy role
|
||||
convention.** It took 15 roles and a grammar inference algorithm to
|
||||
write it down.
|
||||
module ordering convention.** It took 15 roles and a grammar inference
|
||||
algorithm to write it down.
|
||||
|
||||
**Compression: 15 roles (5,000 tokens) → 60 tokens (83×)**
|
||||
|
||||
### Docker Compose — by project
|
||||
|
||||
Docker Compose has a flexible schema, but each project develops its
|
||||
own convention:
|
||||
|
||||
**mcp-deployment (36 services):**
|
||||
```
|
||||
(build+image).command.(environment+volumes)?.ports
|
||||
```
|
||||
**files (6 services):**
|
||||
```
|
||||
image.environment.volumes.network_mode.privileged?.cap_add?
|
||||
```
|
||||
**fresh-ape-base (9 services):**
|
||||
```
|
||||
image.ports?.(depends_on+environment+user+volumes)+
|
||||
```
|
||||
|
||||
### Ensemble dynamics
|
||||
|
||||
The ensemble (CRX + iDRegEx + MDL) selects different winners
|
||||
|
|
@ -270,11 +189,11 @@ depending on the data:
|
|||
|
||||
| Dataset | Winner | Why |
|
||||
|---------|--------|-----|
|
||||
| Ansible deploy (36 roles) | CRX | iDRegEx returned ∅ (too diverse) |
|
||||
| Ansible galaxy (15 roles) | CRX | iDRegEx returned ∅ (too diverse) |
|
||||
| Ansible restore (2 roles) | CRX | Both match all; CRX more compact |
|
||||
| Ansible configure (4 roles) | **iDRegEx** | Finds minimal core `include_role` |
|
||||
| Ansible manage (2 roles) | **iDRegEx** | Core: `assert.authorized_key` |
|
||||
| Helm prom-stack (6 configs) | **iDRegEx** | Finds minimal core across all configs |
|
||||
| Portainer templates (47) | CRX | iDRegEx returned ∅ (no single common field) |
|
||||
| Terraform modules (8) | CRX | Every resource type optional across domains |
|
||||
| GitHub Actions Go lint (6) | CRX | Tight pattern, all match |
|
||||
|
||||
iDRegEx wins when the data has a clear common core. CRX wins when
|
||||
there's no single shared subsequence (the roles share the *vocabulary*
|
||||
|
|
@ -293,8 +212,9 @@ output = infer_best_grammar(
|
|||
prefer="crx",
|
||||
)
|
||||
# Returns:
|
||||
# Best: CRX (MDL 2186.28)
|
||||
# Grammar: docker_volume+?.group?...(assert+...+wait_for)+?.(cron+firewalld)?
|
||||
# Best: CRX (MDL 288)
|
||||
# Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+
|
||||
# .include+?.(npm+pip)+?.lineinfile?
|
||||
|
||||
# Ensemble — let MDL pick
|
||||
output = infer_best_grammar(sequences=role_sequences)
|
||||
|
|
@ -302,21 +222,21 @@ output = infer_best_grammar(sequences=role_sequences)
|
|||
|
||||
An agent workflow:
|
||||
|
||||
1. Agent needs to write deploy role #37
|
||||
2. Finds 36 existing deploy roles, extracts their task module sequences
|
||||
1. Agent needs to write an Ansible role
|
||||
2. Finds 15 existing geerlingguy roles, extracts their task module sequences
|
||||
3. Calls `infer_best_grammar(sequences=..., prefer='crx')`
|
||||
4. Gets back the grammar in 200 tokens
|
||||
4. Gets back the grammar in ~60 tokens
|
||||
5. Generates a new role that follows the structural pattern
|
||||
|
||||
Without the MCP: 36 role files in context (15,000 tokens), or guesswork.
|
||||
With the MCP: one grammar rule (200 tokens), known to match 36/36 roles.
|
||||
Without the MCP: 15 role files in context (5,000 tokens), or guesswork.
|
||||
With the MCP: one grammar rule (~60 tokens), known to match 15/15 roles.
|
||||
|
||||
## What it means
|
||||
|
||||
Grammar inference turns **examples** into **rules**. The rule is a
|
||||
compressed description of the structural convention — and for
|
||||
schema-less content like Ansible roles, this may be the *first time*
|
||||
the convention has been written down at all.
|
||||
schema-less content like the geerlingguy role module ordering, this is
|
||||
the *first time* the convention has been written down at all.
|
||||
|
||||
For LLM agents, this changes the trade-off between context and
|
||||
accuracy. Instead of flooding the context window with examples, the
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue