grammar-inference-engine/SHOWCASE.md

87 lines
2.8 KiB
Markdown
Raw Normal View History

# Dervish — Showcase
2026-07-01 11:21:02 +02:00
<p align="left"><img src="dervish-logo.png" alt="Dervish" width="180"></p>
Infer the **unwritten convention** from existing examples. Given N example
sequences, produce a ~100-char grammar that captures the structural
pattern — in far fewer tokens than the originals.
```
a.b → a then b (concatenation)
(a+b) → a or b (disjunction)
r? → optional (zero or one)
r+ → one or more (iteration)
r+? → zero or more
```
## 1. Ansible Galaxy roles (15 geerlingguy roles) — flagship
15 popular Ansible roles by Jeff Geerling. There is NO written convention
for the module ordering in `tasks/main.yml`. Our grammar is its first
explicit description:
```
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
include+?.(npm+pip)+?.lineinfile?
```
Every role: check preconditions → OS-specific vars → install packages →
configure with templates → start services → optionally handle language tooling.
All 15/15 match. **~29× compression** (7200+ modules → ~250 chars).
**Why it helps an LLM:** Generating a new Ansible role, the LLM knows the
exact structure: fail-check first, then vars, then packages, then config/svc.
No guessing.
## 2. Helm chart (kube-prometheus-stack, 6 configs)
6 different `values.yaml` files rendered through the same chart:
```
Best: iDRegEx | MDL 1433
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
```
The **minimal core** every config must deploy. CRX captures the full
vocabulary (19 kinds). Which one an agent uses depends on the task:
- Bootstrapping a new cluster: iDRegEx — what you can't skip
- Writing a complete chart: CRX — everything you might need
## 3. GitHub Actions (cross-project Go lint, 6 jobs)
Lint jobs from prometheus, goreleaser, cosign, sigstore:
```
Best: CRX | MDL 13.6
Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.
golangci/golangci-lint-action?.megalinter?
```
Four independently-maintained Go projects converged on: checkout → setup Go → run golangci-lint. Only the biggest add megalinter.
**Why it helps an LLM:** Setting up CI for a Go project on GitHub Actions? The grammar encodes an emergent cross-project convention — four teams wrote the same pipeline without coordinating.
## What doesn't work
| Dataset | Problem |
|---------|---------|
| Dockerfiles | Too simple — just the Dockerfile spec |
| Pre-commit (cross-project) | 252 unique hooks, no common core |
| GHA per-project | One repo = too many job types |
| Prometheus rules | Schema-enforced, no convention |
Sweet spot: **multiple implementations of the same abstract task**
with a shared but undocumented pattern.
## Usage
```python
from bex.mcp_server import infer_best_grammar
output = infer_best_grammar(
sequences=role_sequences,
prefer="crx",
)
```