tobjend 547376894c Update README and SHOWCASE with real-world dataset evaluations

README:
- Replace outdated company benchmarks with public showcases
- Add Algorithm Selection Guide
- Add 'When each algorithm wins' table
- Add 'Why grammar inference?' table with value prop for LLMs
- Add 'What doesn't work' section documenting failed approaches
- Update all domain adapter examples with public results
- Clean up outdated references (companyweb roles, hashistack terraform)

SHOWCASE:
- Add Helm (kube-prometheus-stack) with iDRegEx minimal core
- Add Docker Compose per-project patterns
- Add GitHub Actions cross-project Go lint pattern
- Add Terraform modules with vocabulary analysis
- Add 'What doesn't work' section
- Explain WHY each dataset helps an LLM

2026-07-01 10:04:10 +02:00

3.7 KiB

Raw Blame History

Grammar Inference Engine — Showcase

Infer the unwritten convention from existing examples. Given N example sequences, produce a ~100-char grammar that captures the structural pattern — in far fewer tokens than the originals.

a.b       → a then b (concatenation)
(a+b)     → a or b (disjunction)
r?        → optional (zero or one)
r+        → one or more (iteration)
r+?       → zero or more

1. Ansible Galaxy roles (15 geerlingguy roles) — flagship

15 popular Ansible roles by Jeff Geerling. There is NO written convention for the task structure. Our grammar is its first explicit description:

Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
         include+?.(npm+pip)+?.lineinfile?

Every role: check preconditions → OS-specific vars → install packages → configure with templates → start services → optionally handle language tooling.

All 15/15 match. ~29× compression (7200+ modules → ~250 chars).

Why it helps an LLM: Generating a new Ansible role, the LLM knows the exact structure: fail-check first, then vars, then packages, then config/svc. No guessing.

2. Helm chart (kube-prometheus-stack, 6 configs)

6 different values.yaml files rendered through the same chart:

Best: iDRegEx | MDL 1433
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment

The minimal core every config must deploy. CRX captures the full vocabulary (19 kinds). Which one an agent uses depends on the task:

Bootstrapping a new cluster: iDRegEx — what you can't skip
Writing a complete chart: CRX — everything you might need

3. Docker Compose (73 services, 10 projects)

Per-service key order across real-world compose files:

Best: CRX | MDL varies by project
Grammar: (build+image).command.(environment+volumes)?.ports

Per-project patterns emerge:

Nginx-like: build.(command.volumes.ports)
Databases: image.environment.volumes.ports
Language runtimes: build.(environment.command).ports

Why it helps an LLM: The field order in service definitions follows an implicit convention. An agent generating compose files should put image/build first, then command, then environment/volumes, then ports.

4. GitHub Actions (cross-project Go lint, 6 jobs)

Lint jobs from prometheus, goreleaser, cosign, sigstore:

Best: CRX | MDL 13.6
Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.
         golangci/golangci-lint-action?.megalinter?

Every Go project's lint CI follows: checkout → setup Go → run linter. Only the biggest add megalinter.

Why it helps an LLM: Starting a new Go project? The lint workflow has a near-universal pattern.

5. Terraform (8 AWS modules)

Terraform modules by hashicorp and terraform-aws-modules:

Best: CRX | MDL 1876
Grammar: null_resource?.s3_bucket...?.vpc?...(26+ types all optional)

Every resource type is optional — VPC, S3, EC2, and security-group modules share no mandatory ordering. But the vocabulary is the signal: seeing aws_vpc implies subnets, route tables, internet gateways.

Why it helps an LLM: The grammar encodes which resources belong together in each module domain.

What doesn't work

Dataset	Problem
Dockerfiles	Too simple — just the Dockerfile spec
Pre-commit (cross-project)	252 unique hooks, no common core
GHA per-project	One repo = too many job types
Prometheus rules	Schema-enforced, no convention

Sweet spot: multiple implementations of the same abstract task with a shared but undocumented pattern.

Usage

from bex.mcp_server import infer_best_grammar

output = infer_best_grammar(
    sequences=role_sequences,
    prefer="crx",
)

3.7 KiB Raw Blame History Unescape Escape