deduplicate: replace detailed Real-world Results with summary table linking to SHOWCASE.md
This commit is contained in:
parent
b05c3ee116
commit
17b5c271ec
1 changed files with 7 additions and 77 deletions
84
README.md
84
README.md
|
|
@ -108,85 +108,15 @@ Dervish discovers these conventions automatically from existing examples. The do
|
|||
|
||||
## Real-world Results
|
||||
|
||||
### Ansible Galaxy (15 roles, 44+ modules each)
|
||||
Dervish has been tested against public datasets from Ansible Galaxy, Helm, and GitHub Actions — all cases where multiple projects independently converged on an undocumented pattern. [**Full details → SHOWCASE.md**](SHOWCASE.md)
|
||||
|
||||
Data: All 15 [geerlingguy Galaxy roles](https://github.com/geerlingguy) — nginx, php, mysql, docker, etc.
|
||||
| Dataset | Best grammar | Compression |
|
||||
|---------|-------------|-------------|
|
||||
| Ansible Galaxy (15 roles) | `fail?.(include_vars+set_fact+package+file+template+service+...)+.include+?.(npm+pip)+?.lineinfile?` | 5,000 tokens → 60 tokens (83×) |
|
||||
| Helm (6 configs) | `ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment` | ~3,000 tokens → 40 tokens (75×) |
|
||||
| Go lint (6 jobs) | `actions/checkout.(actions/setup-go+run:echo+run:sudo)+.golangci/golangci-lint-action?.megalinter?` | ~900 tokens → 30 tokens (30×) |
|
||||
|
||||
Each role's `tasks/main.yml` is parsed into a sequence of module names. Here are the sequences from two roles:
|
||||
|
||||
```
|
||||
docker: fail → include_vars → include_tasks → package → package → package → ...
|
||||
nginx: fail → include_vars → set_fact → package → file → template → service → ...
|
||||
```
|
||||
|
||||
The extracted symbols are Ansible module names like `fail`, `include_vars`, `set_fact`, `package`, `file`, `template`, `service`, `systemd`, `get_url`, `shell`, `npm`, `pip`, `lineinfile`, `copy`, `unarchive`, `yum`, `apt`, `command`, `user`, `group`, `git`, `mount`, `cron`, `debug`, `iptables`, `ufw`, `hostname`, `sysctl`, `timezone`, `selinux`, `firewalld`, `homebrew`, `supervisorctl`, `postgresql_db`, `mysql_db` — 50+ unique modules across the 15 roles.
|
||||
|
||||
```
|
||||
Best: CRX (MDL 288, 15/15 match)
|
||||
Grammar:
|
||||
fail?.(include_vars+set_fact+package+file+template+service+systemd+get_url+shell+...)+
|
||||
.include+?.(npm+pip)+?.lineinfile?
|
||||
```
|
||||
|
||||
Every single role follows this pattern. The convention was **unwritten** — no document says "Ansible roles should check preconditions first, then install packages, configure with templates, enable services, then optionally install language packages."
|
||||
|
||||
This is the first explicit description of the geerlingguy role module ordering convention.
|
||||
|
||||
**Compression:** The grammar is ~250 chars. The 15 examples are 7200+ modules combined. **~29× compression.**
|
||||
|
||||
### Helm (kube-prometheus-stack, 6 CI configs)
|
||||
|
||||
Data: 6 different `values.yaml` configurations rendered through `helm template`. Each config produces a sequence of K8s `kind` values in rendered YAML order:
|
||||
|
||||
```
|
||||
config-1: ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment → ServiceMonitor → PrometheusRule
|
||||
config-2: ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment → ConfigMap → ServiceMonitor
|
||||
config-3: ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment → Alertmanager → Prometheus
|
||||
```
|
||||
|
||||
Extracted symbols: `ServiceAccount`, `ClusterRole`, `ClusterRoleBinding`, `Service`, `Deployment`, `ConfigMap`, `Alertmanager`, `Prometheus`, `PrometheusRule`, `ServiceMonitor`, `Role`, `RoleBinding`, `Job`, `DaemonSet`, `Secret`, `ValidatingWebhookConfiguration` — 19 kinds total.
|
||||
|
||||
```
|
||||
Best: iDRegEx (MDL 1433)
|
||||
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
||||
|
||||
iDRegEx MDL= 1432.99 ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
||||
CRX MDL= 2651.74 (Alertmanager+ClusterRole+...+ValidatingWebhookConfiguration)+.Role+?...
|
||||
```
|
||||
|
||||
iDRegEx finds the **minimum core** — what every config always deploys. CRX captures the full vocabulary (19 resource kinds). Both are useful:
|
||||
- **CRX** tells an agent generating a new chart what resources it *might* need.
|
||||
- **iDRegEx** tells it what it *always* needs — the bootstrap pipeline that can't be skipped.
|
||||
|
||||
### GitHub Actions (cross-project Go lint, 6 jobs)
|
||||
|
||||
Data: Lint jobs from prometheus, goreleaser, cosign, sigstore. Each job's steps are extracted as `uses:` or `run:` values:
|
||||
|
||||
```
|
||||
prometheus lint: actions/checkout → actions/setup-go → run:sudo → run:echo → golangci/golangci-lint-action → golangci/golangci-lint-action → ...
|
||||
goreleaser lint: actions/checkout → actions/setup-go → gitleaks/gitleaks-action → golangci/golangci-lint-action
|
||||
cosign lint: actions/checkout → ossf/scorecard-action → actions/upload-artifact → github/codeql-action/upload-sarif
|
||||
```
|
||||
|
||||
Extracted symbols: `actions/checkout`, `actions/setup-go`, `golangci/golangci-lint-action`, `megalinter/megalinter`, `gitleaks/gitleaks-action`, `ossf/scorecard-action`, `github/codeql-action/*`, and `run:*` commands.
|
||||
|
||||
```
|
||||
Best: CRX (MDL 13.6)
|
||||
Grammar: actions/checkout.(actions/setup-go+run:echo+run:sudo)+.golangci/golangci-lint-action?.megalinter?
|
||||
```
|
||||
|
||||
Every Go project's lint CI follows: checkout → setup Go → run golangci-lint. Only the biggest projects add megalinter.
|
||||
|
||||
### What doesn't work
|
||||
|
||||
Not every domain has an unwritten convention. Grammar inference failed (produced trivial `(a+b+c+...)+` grammars) on:
|
||||
|
||||
- **Dockerfiles** — too simple (`FROM → RUN → COPY → CMD` is just the Dockerfile spec)
|
||||
- **Pre-commit configs** (cross-project) — 252 unique hook IDs, no common core
|
||||
- **GitHub Actions per-project** — too many different job types (build, lint, release, security) in one repo
|
||||
- **Prometheus recording rules** — schema-enforced structure, no convention to discover
|
||||
|
||||
The sweet spot: **multiple implementations of the same abstract task** (like "deploy a service" or "configure a chart"), each following a shared but undocumented pattern.
|
||||
The sweet spot: **multiple implementations of the same abstract task** with a shared but undocumented pattern. Not everything works — Dockerfiles, pre-commit configs, and schema-enforced formats are too rigid or too diverse to yield a convention.
|
||||
|
||||
## Algorithm Selection Guide
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue