112 lines
3.9 KiB
Markdown
112 lines
3.9 KiB
Markdown
# Dervish — Showcase
|
||
|
||
<p align="left"><img src="dervish-logo.png" alt="Dervish" width="180"></p>
|
||
|
||
Infer the **unwritten convention** from existing examples. Given N example
|
||
sequences, produce a ~100-char grammar that captures the structural
|
||
pattern — in far fewer tokens than the originals.
|
||
|
||
```text
|
||
a.b → a then b (concatenation)
|
||
(a+b) → a or b (disjunction)
|
||
r? → optional (zero or one)
|
||
r+ → one or more (iteration)
|
||
r+? → zero or more
|
||
```
|
||
|
||
## 1. Ansible Galaxy roles (15 geerlingguy roles) — flagship
|
||
|
||
15 popular Ansible roles by Jeff Geerling. There is NO written convention
|
||
for the module ordering in `tasks/main.yml`. Our grammar is its first
|
||
explicit description:
|
||
|
||
```text
|
||
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.
|
||
include+?.(npm+pip)+?.lineinfile?
|
||
```
|
||
|
||
Every role: check preconditions → OS-specific vars → install packages →
|
||
configure with templates → start services → optionally handle language tooling.
|
||
|
||
All 15/15 match. **~29× compression** (7200+ modules → ~250 chars).
|
||
|
||
**Why it helps an LLM:** Generating a new Ansible role, the LLM knows the
|
||
exact structure: fail-check first, then vars, then packages, then config/svc.
|
||
No guessing.
|
||
|
||
### Bonus: core+outlier analysis
|
||
|
||
Set `min_coverage=0.8` to find the tight grammar for the majority while
|
||
flagging outlier roles with unusual module usage:
|
||
|
||
```text
|
||
Core CRX (80% coverage, 3 outliers):
|
||
fail?.(include_vars+set_fact+package+file+template+service+...)+
|
||
|
||
Outlier sequences:
|
||
1. phpmyadmin: include_vars → set_fact → include → include → lineinfile
|
||
2. composer: fail → set_fact → stat → uri → get_url → command
|
||
3. pip: package → file → pip
|
||
```
|
||
|
||
phpmyadmin uses raw `lineinfile` instead of templates; composer needs
|
||
a `stat` check + `uri` download; pip is purely `pip` — all three deviate
|
||
from the mainstream install → configure → enable pattern.
|
||
|
||
## 2. Helm chart (kube-prometheus-stack, 6 configs)
|
||
|
||
6 different `values.yaml` files rendered through the same chart:
|
||
|
||
```text
|
||
Best: iDRegEx | MDL 1433
|
||
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
||
```
|
||
|
||
The **minimal core** every config must deploy. CRX captures the full
|
||
vocabulary (19 kinds). Which one an agent uses depends on the task:
|
||
- Bootstrapping a new cluster: iDRegEx — what you can't skip
|
||
- Writing a complete chart: CRX — everything you might need
|
||
|
||
## 3. GitHub Actions (cross-project Go lint, 6 jobs)
|
||
|
||
Lint jobs from prometheus, goreleaser, cosign, sigstore:
|
||
|
||
```text
|
||
Best: CRX | MDL 13.6
|
||
Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.
|
||
golangci/golangci-lint-action?.megalinter?
|
||
```
|
||
|
||
Four independently-maintained Go projects converged on: checkout → setup Go → run golangci-lint. Only the biggest add megalinter.
|
||
|
||
**Why it helps an LLM:** Setting up CI for a Go project on GitHub Actions? The grammar encodes an emergent cross-project convention — four teams wrote the same pipeline without coordinating.
|
||
|
||
## What doesn't work
|
||
|
||
| Dataset | Problem |
|
||
|---------|---------|
|
||
| Dockerfiles | Too simple — just the Dockerfile spec |
|
||
| Pre-commit (cross-project) | 252 unique hooks, no common core |
|
||
| GHA per-project | One repo = too many job types |
|
||
| Prometheus rules | Schema-enforced, no convention |
|
||
|
||
Sweet spot: **multiple implementations of the same abstract task**
|
||
with a shared but undocumented pattern.
|
||
|
||
## Usage
|
||
|
||
```python
|
||
from bex import infer_ensemble
|
||
|
||
# Pick best across all 3 algorithms (CRX + iDRegEx + kOREInference)
|
||
result = infer_ensemble(role_sequences)
|
||
print(f"Best: {result['best']['algorithm']}")
|
||
print(f"Grammar: {result['best']['grammar']}")
|
||
|
||
# Or: find the tight core + flag outliers
|
||
result = infer_ensemble(role_sequences, min_coverage=0.8)
|
||
print(f"Core: {result['core']['grammar']}")
|
||
print(f"Outliers ({result['core']['outlier_count']}):")
|
||
for i, o in enumerate(result['core']['outliers'], 1):
|
||
print(f" {i}. {' → '.join(str(x) for x in o[:8])}{'...' if len(o) > 8 else ''}")
|
||
```
|