Blog post: remove 'The bugs we found' section (all 4 bugs were from our implementation, not the paper algorithms). Replace company data references in MCP section with Galaxy example. Update ensemble dynamics table with public datasets. README: replace Docker Compose with Portainer templates in 'Why grammar inference?' table, Real-world Results, and Domain Adapters. SHOWCASE: replace Docker Compose with Portainer templates. All claims verified: no public documentation of geerlingguy module ordering convention exists.
3.8 KiB
Grammar Inference Engine — Showcase
Infer the unwritten convention from existing examples. Given N example sequences, produce a ~100-char grammar that captures the structural pattern — in far fewer tokens than the originals.
a.b → a then b (concatenation)
(a+b) → a or b (disjunction)
r? → optional (zero or one)
r+ → one or more (iteration)
r+? → zero or more
1. Ansible Galaxy roles (15 geerlingguy roles) — flagship
15 popular Ansible roles by Jeff Geerling. There is NO written convention
for the module ordering in tasks/main.yml. Our grammar is its first
explicit description:
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
include+?.(npm+pip)+?.lineinfile?
Every role: check preconditions → OS-specific vars → install packages → configure with templates → start services → optionally handle language tooling.
All 15/15 match. ~29× compression (7200+ modules → ~250 chars).
Why it helps an LLM: Generating a new Ansible role, the LLM knows the exact structure: fail-check first, then vars, then packages, then config/svc. No guessing.
2. Helm chart (kube-prometheus-stack, 6 configs)
6 different values.yaml files rendered through the same chart:
Best: iDRegEx | MDL 1433
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
The minimal core every config must deploy. CRX captures the full vocabulary (19 kinds). Which one an agent uses depends on the task:
- Bootstrapping a new cluster: iDRegEx — what you can't skip
- Writing a complete chart: CRX — everything you might need
3. Portainer templates (47 templates)
Official Portainer app templates from portainer/templates:
Best: CRX | MDL 1282
Grammar: (type+title)+.
(categories+description+image+logo+name+note+platform)+.
repository?.(env+ports+privileged+volumes)+?.command?
Field ordering convention: identity (type, title) → metadata
(description, categories, platform, logo) → source
(image, repository) → deployment (ports, volumes, env) →
entrypoint (command). 21 unique orderings, one grammar.
Why it helps an LLM: Writing a Portainer template needs the right field order. The grammar tells you: identity first, then metadata, then source, then deployment config.
4. GitHub Actions (cross-project Go lint, 6 jobs)
Lint jobs from prometheus, goreleaser, cosign, sigstore:
Best: CRX | MDL 13.6
Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.
golangci/golangci-lint-action?.megalinter?
Every Go project's lint CI follows: checkout → setup Go → run linter. Only the biggest add megalinter.
Why it helps an LLM: Starting a new Go project? The lint workflow has a near-universal pattern.
5. Terraform (8 AWS modules)
Terraform modules by hashicorp and terraform-aws-modules:
Best: CRX | MDL 1876
Grammar: null_resource?.s3_bucket...?.vpc?...(26+ types all optional)
Every resource type is optional — VPC, S3, EC2, and security-group
modules share no mandatory ordering. But the vocabulary is the signal:
seeing aws_vpc implies subnets, route tables, internet gateways.
Why it helps an LLM: The grammar encodes which resources belong together in each module domain.
What doesn't work
| Dataset | Problem |
|---|---|
| Dockerfiles | Too simple — just the Dockerfile spec |
| Pre-commit (cross-project) | 252 unique hooks, no common core |
| GHA per-project | One repo = too many job types |
| Prometheus rules | Schema-enforced, no convention |
Sweet spot: multiple implementations of the same abstract task with a shared but undocumented pattern.
Usage
from bex.mcp_server import infer_best_grammar
output = infer_best_grammar(
sequences=role_sequences,
prefer="crx",
)