grammar-inference-engine/SHOWCASE.md

86 lines
2.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Dervish — Showcase
<p align="left"><img src="dervish-logo.png" alt="Dervish" width="120"></p>
Infer the **unwritten convention** from existing examples. Given N example
sequences, produce a ~100-char grammar that captures the structural
pattern — in far fewer tokens than the originals.
```
a.b → a then b (concatenation)
(a+b) → a or b (disjunction)
r? → optional (zero or one)
r+ → one or more (iteration)
r+? → zero or more
```
## 1. Ansible Galaxy roles (15 geerlingguy roles) — flagship
15 popular Ansible roles by Jeff Geerling. There is NO written convention
for the module ordering in `tasks/main.yml`. Our grammar is its first
explicit description:
```
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
include+?.(npm+pip)+?.lineinfile?
```
Every role: check preconditions → OS-specific vars → install packages →
configure with templates → start services → optionally handle language tooling.
All 15/15 match. **~29× compression** (7200+ modules → ~250 chars).
**Why it helps an LLM:** Generating a new Ansible role, the LLM knows the
exact structure: fail-check first, then vars, then packages, then config/svc.
No guessing.
## 2. Helm chart (kube-prometheus-stack, 6 configs)
6 different `values.yaml` files rendered through the same chart:
```
Best: iDRegEx | MDL 1433
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
```
The **minimal core** every config must deploy. CRX captures the full
vocabulary (19 kinds). Which one an agent uses depends on the task:
- Bootstrapping a new cluster: iDRegEx — what you can't skip
- Writing a complete chart: CRX — everything you might need
## 3. GitHub Actions (cross-project Go lint, 6 jobs)
Lint jobs from prometheus, goreleaser, cosign, sigstore:
```
Best: CRX | MDL 13.6
Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.
golangci/golangci-lint-action?.megalinter?
```
Four independently-maintained Go projects converged on: checkout → setup Go → run golangci-lint. Only the biggest add megalinter.
**Why it helps an LLM:** Setting up CI for a Go project on GitHub Actions? The grammar encodes an emergent cross-project convention — four teams wrote the same pipeline without coordinating.
## What doesn't work
| Dataset | Problem |
|---------|---------|
| Dockerfiles | Too simple — just the Dockerfile spec |
| Pre-commit (cross-project) | 252 unique hooks, no common core |
| GHA per-project | One repo = too many job types |
| Prometheus rules | Schema-enforced, no convention |
Sweet spot: **multiple implementations of the same abstract task**
with a shared but undocumented pattern.
## Usage
```python
from bex.mcp_server import infer_best_grammar
output = infer_best_grammar(
sequences=role_sequences,
prefer="crx",
)
```