purge Portainer references, format-specific tools, and Domain Adapters section; make showcases concrete with extracted types
This commit is contained in:
parent
097dfc9954
commit
25d844d1f9
3 changed files with 83 additions and 193 deletions
250
README.md
250
README.md
|
|
@ -1,17 +1,17 @@
|
||||||
# Dervish
|
# Dervish
|
||||||
|
|
||||||

|
<p align="center"><img src="dervish.gif" alt="Dervish"></p>
|
||||||
|
|
||||||
**Dervish** infers **regular expression grammars** from example sequences using the BEX family of algorithms. Given a set of example sequences (strings over some alphabet), it learns a compact regular expression that describes the general pattern.
|
**Dervish** infers **regular expression grammars** from example sequences using the BEX family of algorithms. Given a set of example sequences (strings over some alphabet), it learns a compact regular expression that describes the general pattern.
|
||||||
|
|
||||||
## MCP Server
|
## MCP Server
|
||||||
|
|
||||||
The primary interface is a **Model Context Protocol (MCP)** server. Connect any MCP-compatible client (Claude, opencode, etc.) and get grammar inference as a tool:
|
The primary interface is a **Model Context Protocol (MCP)** server. Connect any MCP-compatible client (pi.dev, opencode, vibe, etc.) and get grammar inference as a tool:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"mcpServers": {
|
"mcpServers": {
|
||||||
"grammar-inference": {
|
"dervish": {
|
||||||
"command": "python3",
|
"command": "python3",
|
||||||
"args": ["/path/to/bex/mcp_server.py"]
|
"args": ["/path/to/bex/mcp_server.py"]
|
||||||
}
|
}
|
||||||
|
|
@ -21,46 +21,45 @@ The primary interface is a **Model Context Protocol (MCP)** server. Connect any
|
||||||
|
|
||||||
### Tools
|
### Tools
|
||||||
|
|
||||||
| Tool | What it does |
|
| Tool | Parameters | What it does |
|
||||||
|------|-------------|
|
|------|-----------|-------------|
|
||||||
| `infer_grammar(sequences, method, kmax, N)` | Core CRX or iDRegEx inference |
|
| `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | **Recommended.** Runs CRX + iDRegEx, picks best by MDL. Set `prefer='crx'` or `prefer='idregex'` to run one algorithm. |
|
||||||
| `infer_best_grammar(sequences, prefer, kmax, N)` | **Ensemble:** runs both CRX and iDRegEx, picks the best by MDL score. `prefer='crx'` or `prefer='idregex'` to skip the comparison and return only that algorithm. |
|
| `infer_grammar` | `sequences`, `method`, `kmax`, `N` | Core single-algorithm inference. `method='crx'` (fast, deterministic) or `method='idregex'` (probabilistic EM). |
|
||||||
| `infer_yaml_grammar(yaml_dir, pattern, method)` | YAML → key-paths → grammar |
|
|
||||||
| `infer_ansible_role_grammar(roles_dir)` | Ansible role module sequences → per-category grammar |
|
**Parameters explained:**
|
||||||
|
- **`kmax`** (1–5): Context window for iDRegEx's k-testable automaton. Higher values capture longer-range dependencies but need more data and are slower. Default 2 works for most cases.
|
||||||
|
- **`N`** (1–10): Baum-Welch EM iterations for iDRegEx training. More iterations = better convergence but slower. Default 3 is a good balance.
|
||||||
|
- **`prefer`**: Skip the CRX-vs-iDRegEx comparison. Use when you know which algorithm fits your data.
|
||||||
|
|
||||||
### Agent workflow
|
### Agent workflow
|
||||||
|
|
||||||
An LLM agent uses the MCP to discover an unwritten convention from existing examples:
|
An LLM agent uses the MCP to discover an unwritten convention from existing examples — compressing hundreds of files into a single ~60-token rule:
|
||||||
|
|
||||||
```
|
```
|
||||||
User: Generate a new Ansible role for installing PostgreSQL.
|
User: Generate a new Ansible role for installing PostgreSQL.
|
||||||
Agent: I'll first check 15 existing geerlingguy roles to find the structural pattern.
|
|
||||||
[calls infer_best_grammar with 15 role sequences, prefer='crx']
|
|
||||||
|
|
||||||
Best: CRX (MDL 288)
|
Agent: Let me check what pattern the existing community roles follow.
|
||||||
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+
|
I'll look at 15 popular geerlingguy roles.
|
||||||
.include+?.(npm+pip)+?.lineinfile?
|
|
||||||
|
|
||||||
Convention: check preconditions → OS-specific vars → install packages →
|
[finds role directories, extracts task module sequences,
|
||||||
configure templates → start services → handle language tooling.
|
calls infer_best_grammar(sequences=..., prefer='crx')]
|
||||||
|
|
||||||
|
Dervish returns:
|
||||||
|
Best: CRX (MDL 288)
|
||||||
|
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+
|
||||||
|
.include+?.(npm+pip)+?.lineinfile?
|
||||||
|
|
||||||
|
This tells me: every role starts with a fail check for preconditions,
|
||||||
|
then OS-specific variables, installs packages, configures with templates,
|
||||||
|
starts services, and optionally handles language tooling (npm/pip).
|
||||||
|
The role should end with a lineinfile tweak.
|
||||||
|
|
||||||
|
I'll generate the new role following this structure.
|
||||||
```
|
```
|
||||||
|
|
||||||
Without the MCP: 15 role files in context (5,000+ tokens) or guesswork.
|
**Without Dervish:** the agent stuffs 15 role files into context (5,000+ tokens per role = beyond any context window), or guesses the pattern from 1–2 examples and often gets it wrong.
|
||||||
With the MCP: one grammar rule (~60 tokens), known to match 15/15 existing roles.
|
|
||||||
|
|
||||||
## Why grammar inference?
|
**With Dervish:** one MCP call returns a ~60-token grammar known to match 15/15 existing roles. The agent follows it reliably.
|
||||||
|
|
||||||
There are many domains where developers follow **unwritten conventions** — implicit rules about the order and structure of things that no formal schema captures. An LLM generating code in these domains needs to know the convention, but it's rarely documented.
|
|
||||||
|
|
||||||
Grammar inference automatically discovers these conventions from examples:
|
|
||||||
|
|
||||||
| Domain | Unwritten convention | What the grammar tells an LLM |
|
|
||||||
|--------|---------------------|-------------------------------|
|
|
||||||
| Ansible roles | `fail → include_vars/set_fact → package → file/template → service → ... → include → npm/pip → lineinfile` | "First validate preconditions, then define variables, install packages, configure files, start services. Include other roles last." |
|
|
||||||
| Helm charts | `ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment` | "Always start with RBAC, then Service, then Deployment. Other resources are optional." |
|
|
||||||
| Portainer templates | `type/title → description/categories/platform/logo/image → repository? → env/ports/volumes? → command?` | "Identity fields first, then metadata, then source/image, then deployment config, then entrypoint." |
|
|
||||||
| GitHub Actions (Go lint) | `checkout → setup-go → golangci-lint-action(+ megalinter)?` | "Checkout, set up Go, run the linter. Only megalinter for extra coverage." |
|
|
||||||
| Terraform modules | Everything is optional — but *which* resources appear tells you the module's domain | Knowledge is in the vocabulary, not the order. VPC implies subnets, route tables, gateways. |
|
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
|
|
@ -83,12 +82,34 @@ print(f"Grammar: {result['best']['grammar']}")
|
||||||
print(f"Score: {result['best']['mdl_score']}")
|
print(f"Score: {result['best']['mdl_score']}")
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Why not just use a schema?
|
||||||
|
|
||||||
|
Many of the things developers build every day **have no formal schema**. They're free-form scripts, config files, or YAML blobs where the structure is emergent convention, not enforced specification. An LLM generating new content in these domains needs to know the convention — but it's never been written down.
|
||||||
|
|
||||||
|
Dervish discovers these conventions automatically from existing examples. The domains below are **just examples** of what it can do — the same approach works for any sequential data with an unwritten pattern.
|
||||||
|
|
||||||
|
| Domain | What gets extracted | Example extracted symbols | What Dervish discovers | Why it helps an LLM |
|
||||||
|
|--------|-------------------|--------------------------|----------------------|---------------------|
|
||||||
|
| Ansible roles | Module names from `tasks/main.yml` in order | `fail`, `include_vars`, `set_fact`, `package`, `file`, `template`, `service`, `npm`, `pip`, `lineinfile` | `fail?.(include_vars+set_fact+package+file+template+service+...)+.include+?.(npm+pip)+?.lineinfile?` | "Validate preconditions first, then set vars, install packages, configure with templates, start services. Include sub-roles last." |
|
||||||
|
| Helm charts | K8s resource kinds from `helm template` output in rendered order | `ServiceAccount`, `ClusterRole`, `ClusterRoleBinding`, `Service`, `Deployment`, `ConfigMap`, `Alertmanager` | `ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment` (iDRegEx minimal core) | "Every Prometheus stack needs this bootstrap pipeline. Everything else is optional." |
|
||||||
|
| GitHub Actions (Go lint) | Step `uses:` or `run:` values from workflow YAML in job order | `actions/checkout`, `actions/setup-go`, `golangci/golangci-lint-action`, `megalinter/megalinter` | `actions/checkout.(actions/setup-go+run:echo+run:sudo)+.golangci/golangci-lint-action?.megalinter?` | "Starting a new Go project? The lint workflow has a near-universal pattern." |
|
||||||
|
| Terraform modules | Resource type strings from `.tf` files in declaration order | `aws_vpc`, `aws_subnet`, `aws_route_table`, `aws_internet_gateway`, `aws_security_group`, `aws_instance`, `aws_s3_bucket` | Everything optional (domains too different), but certain types always cluster together | "If you see `aws_vpc`, expect subnets, route tables, gateways to follow. The grammar encodes each domain's resource catalogue." |
|
||||||
|
|
||||||
## Real-world Results
|
## Real-world Results
|
||||||
|
|
||||||
### Ansible Galaxy (15 roles, 44+ modules each)
|
### Ansible Galaxy (15 roles, 44+ modules each)
|
||||||
|
|
||||||
Data: All 15 [geerlingguy Galaxy roles](https://github.com/geerlingguy) — nginx, php, mysql, docker, etc.
|
Data: All 15 [geerlingguy Galaxy roles](https://github.com/geerlingguy) — nginx, php, mysql, docker, etc.
|
||||||
|
|
||||||
|
Each role's `tasks/main.yml` is parsed into a sequence of module names. Here are the sequences from two roles:
|
||||||
|
|
||||||
|
```
|
||||||
|
docker: fail → include_vars → include_tasks → package → package → package → ...
|
||||||
|
nginx: fail → include_vars → set_fact → package → file → template → service → ...
|
||||||
|
```
|
||||||
|
|
||||||
|
The extracted symbols are Ansible module names like `fail`, `include_vars`, `set_fact`, `package`, `file`, `template`, `service`, `systemd`, `get_url`, `shell`, `npm`, `pip`, `lineinfile`, `copy`, `unarchive`, `yum`, `apt`, `command`, `user`, `group`, `git`, `mount`, `cron`, `debug`, `iptables`, `ufw`, `hostname`, `sysctl`, `timezone`, `selinux`, `firewalld`, `homebrew`, `supervisorctl`, `postgresql_db`, `mysql_db` — 50+ unique modules across the 15 roles.
|
||||||
|
|
||||||
```
|
```
|
||||||
Best: CRX (MDL 288, 15/15 match)
|
Best: CRX (MDL 288, 15/15 match)
|
||||||
Grammar:
|
Grammar:
|
||||||
|
|
@ -104,7 +125,15 @@ This is the first explicit description of the geerlingguy role module ordering c
|
||||||
|
|
||||||
### Helm (kube-prometheus-stack, 6 CI configs)
|
### Helm (kube-prometheus-stack, 6 CI configs)
|
||||||
|
|
||||||
Data: 6 different `values.yaml` configurations rendered through `helm template`.
|
Data: 6 different `values.yaml` configurations rendered through `helm template`. Each config produces a sequence of K8s `kind` values in rendered YAML order:
|
||||||
|
|
||||||
|
```
|
||||||
|
config-1: ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment → ServiceMonitor → PrometheusRule
|
||||||
|
config-2: ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment → ConfigMap → ServiceMonitor
|
||||||
|
config-3: ServiceAccount → ClusterRole → ClusterRoleBinding → Service → Deployment → Alertmanager → Prometheus
|
||||||
|
```
|
||||||
|
|
||||||
|
Extracted symbols: `ServiceAccount`, `ClusterRole`, `ClusterRoleBinding`, `Service`, `Deployment`, `ConfigMap`, `Alertmanager`, `Prometheus`, `PrometheusRule`, `ServiceMonitor`, `Role`, `RoleBinding`, `Job`, `DaemonSet`, `Secret`, `ValidatingWebhookConfiguration` — 19 kinds total.
|
||||||
|
|
||||||
```
|
```
|
||||||
Best: iDRegEx (MDL 1433)
|
Best: iDRegEx (MDL 1433)
|
||||||
|
|
@ -118,21 +147,17 @@ iDRegEx finds the **minimum core** — what every config always deploys. CRX cap
|
||||||
- **CRX** tells an agent generating a new chart what resources it *might* need.
|
- **CRX** tells an agent generating a new chart what resources it *might* need.
|
||||||
- **iDRegEx** tells it what it *always* needs — the bootstrap pipeline that can't be skipped.
|
- **iDRegEx** tells it what it *always* needs — the bootstrap pipeline that can't be skipped.
|
||||||
|
|
||||||
### Portainer templates (47 templates)
|
|
||||||
|
|
||||||
Data: Official Portainer app templates from the [portainer/templates](https://github.com/portainer/templates) repo.
|
|
||||||
|
|
||||||
```
|
|
||||||
Best: CRX (MDL 1282)
|
|
||||||
Grammar: (type+title)+.(categories+description+image+logo+name+note+platform)+.
|
|
||||||
repository?.(env+ports+privileged+volumes)+?.command?
|
|
||||||
```
|
|
||||||
|
|
||||||
Template fields follow a consistent arc: identity (`type`, `title`) → metadata (`description`, `categories`, `platform`, `logo`) → source (`image`, `repository`) → deployment (`ports`, `volumes`, `env`) → entrypoint (`command`). 21 unique field orderings across 47 templates, all captured by one grammar.
|
|
||||||
|
|
||||||
### GitHub Actions (cross-project Go lint, 6 jobs)
|
### GitHub Actions (cross-project Go lint, 6 jobs)
|
||||||
|
|
||||||
Data: Lint jobs from prometheus, goreleaser, cosign, sigstore.
|
Data: Lint jobs from prometheus, goreleaser, cosign, sigstore. Each job's steps are extracted as `uses:` or `run:` values:
|
||||||
|
|
||||||
|
```
|
||||||
|
prometheus lint: actions/checkout → actions/setup-go → run:sudo → run:echo → golangci/golangci-lint-action → golangci/golangci-lint-action → ...
|
||||||
|
goreleaser lint: actions/checkout → actions/setup-go → gitleaks/gitleaks-action → golangci/golangci-lint-action
|
||||||
|
cosign lint: actions/checkout → ossf/scorecard-action → actions/upload-artifact → github/codeql-action/upload-sarif
|
||||||
|
```
|
||||||
|
|
||||||
|
Extracted symbols: `actions/checkout`, `actions/setup-go`, `golangci/golangci-lint-action`, `megalinter/megalinter`, `gitleaks/gitleaks-action`, `ossf/scorecard-action`, `github/codeql-action/*`, and `run:*` commands.
|
||||||
|
|
||||||
```
|
```
|
||||||
Best: CRX (MDL 13.6)
|
Best: CRX (MDL 13.6)
|
||||||
|
|
@ -143,7 +168,15 @@ Every Go project's lint CI follows: checkout → setup Go → run golangci-lint.
|
||||||
|
|
||||||
### Terraform (8 AWS modules, 156+ resources each)
|
### Terraform (8 AWS modules, 156+ resources each)
|
||||||
|
|
||||||
Data: `terraform-aws-{vpc,ec2,s3-bucket,autoscaling,security-group}` modules.
|
Data: `terraform-aws-{vpc,ec2,s3-bucket,autoscaling,security-group}` modules from hashicorp and terraform-aws-modules. Each `.tf` file is parsed for `resource` declarations in order:
|
||||||
|
|
||||||
|
```
|
||||||
|
vpc module: data:vpc_endpoint_service → vpc → vpc_endpoint → vpc_endpoint_route_table_association → egress_only_internet_gateway → route_table → route → subnet → ...
|
||||||
|
ec2 module: data:partition → data:ssm_parameter → instance → spot_instance_request → ec2_tag → ebs_volume → volume_attachment → data:iam_policy_document → iam_role → iam_role_policy_attachment → iam_instance_profile → ...
|
||||||
|
s3 module: iam_role → data:iam_policy_document → iam_policy → data:partition → s3_bucket → s3_bucket_versioning → s3_bucket_logging → s3_bucket_server_side_encryption → ...
|
||||||
|
```
|
||||||
|
|
||||||
|
Extracted symbols: `aws_vpc`, `aws_subnet`, `aws_route_table`, `aws_internet_gateway`, `aws_nat_gateway`, `aws_vpn_gateway`, `aws_security_group`, `aws_security_group_rule`, `aws_instance`, `aws_eip`, `aws_ebs_volume`, `aws_s3_bucket`, `aws_s3_bucket_versioning`, `aws_s3_bucket_logging`, `aws_iam_role`, `aws_iam_policy`, `aws_autoscaling_group`, `aws_launch_configuration`, `random_pet`, `null_resource` — 30+ types across modules.
|
||||||
|
|
||||||
```
|
```
|
||||||
Best: CRX (MDL 1876)
|
Best: CRX (MDL 1876)
|
||||||
|
|
@ -182,129 +215,6 @@ The sweet spot: **multiple implementations of the same abstract task** (like "de
|
||||||
| 2–3 sequences | iDRegEx | CRX overfits. iDRegEx handles noise better. |
|
| 2–3 sequences | iDRegEx | CRX overfits. iDRegEx handles noise better. |
|
||||||
| Many sequences, tight pattern | CRX | Learns precise concatenation with optional suffixes. |
|
| Many sequences, tight pattern | CRX | Learns precise concatenation with optional suffixes. |
|
||||||
|
|
||||||
## Domain Adapters
|
|
||||||
|
|
||||||
### Ansible Roles
|
|
||||||
|
|
||||||
```python
|
|
||||||
from bex.ensemble import infer_ensemble
|
|
||||||
from bex.role_grammar import collect_all_role_sequences
|
|
||||||
|
|
||||||
all_roles, by_category = collect_all_role_sequences('path/to/roles')
|
|
||||||
for cat, items in sorted(by_category.items()):
|
|
||||||
seqs = [s for _, s in items]
|
|
||||||
result = infer_ensemble(seqs)
|
|
||||||
print(f"── {cat} ({len(items)} roles) ──")
|
|
||||||
print(f" Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
|
|
||||||
print(f" Grammar: {result['best']['grammar']}")
|
|
||||||
```
|
|
||||||
|
|
||||||
**Example** (15 geerlingguy Galaxy roles):
|
|
||||||
|
|
||||||
```
|
|
||||||
── other (15 roles) ──
|
|
||||||
Best: CRX (MDL 288, 15/15 match)
|
|
||||||
Grammar: fail?.(include_vars+set_fact+package+file+template+service+...)+.include+?.(npm+pip)+?.lineinfile?
|
|
||||||
Why: CRX matches 15/15 sequences, iDRegEx matches 3/15. CRX selected.
|
|
||||||
```
|
|
||||||
|
|
||||||
### Helm Charts
|
|
||||||
|
|
||||||
```python
|
|
||||||
import subprocess, yaml
|
|
||||||
from bex.ensemble import infer_ensemble
|
|
||||||
|
|
||||||
seqs = []
|
|
||||||
for vf in sorted(Path('ci/').glob('*-values.yaml')):
|
|
||||||
out = subprocess.run(
|
|
||||||
['helm', 'template', 'test', '.', '--skip-tests', '-f', str(vf)],
|
|
||||||
capture_output=True, text=True, timeout=120,
|
|
||||||
)
|
|
||||||
kinds = [d['kind'] for d in yaml.safe_load_all(out.stdout)
|
|
||||||
if d and isinstance(d, dict) and 'kind' in d]
|
|
||||||
if kinds:
|
|
||||||
seqs.append(kinds)
|
|
||||||
|
|
||||||
result = infer_ensemble(seqs)
|
|
||||||
print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
|
|
||||||
print(f"Grammar: {result['best']['grammar']}")
|
|
||||||
```
|
|
||||||
|
|
||||||
**Example** (kube-prometheus-stack, 6 CI configs):
|
|
||||||
|
|
||||||
```
|
|
||||||
Best: iDRegEx (MDL 1433)
|
|
||||||
Grammar: ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
|
||||||
|
|
||||||
iDRegEx MDL= 1432.99 ServiceAccount.ClusterRole.ClusterRoleBinding.Service.Deployment
|
|
||||||
CRX MDL= 2651.74 (Alertmanager+ClusterRole+...+ValidatingWebhookConfiguration)+.Role+?...
|
|
||||||
|
|
||||||
Why: iDRegEx (score 1433.0) vs CRX (score 2651.7). CRX matches 6/6, iDRegEx matches 1/6.
|
|
||||||
iDRegEx selected (MDL score 1433.0).
|
|
||||||
```
|
|
||||||
|
|
||||||
### Terraform
|
|
||||||
|
|
||||||
```python
|
|
||||||
import re
|
|
||||||
from bex.ensemble import infer_ensemble
|
|
||||||
|
|
||||||
seqs = []
|
|
||||||
for tf in sorted(Path('.').rglob('*.tf')):
|
|
||||||
resources = re.findall(r'resource "(\w+)" "\w+" {', tf.read_text())
|
|
||||||
if resources:
|
|
||||||
seqs.append(resources)
|
|
||||||
|
|
||||||
result = infer_ensemble(seqs)
|
|
||||||
print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
|
|
||||||
print(f"Grammar: {result['best']['grammar']}")
|
|
||||||
```
|
|
||||||
|
|
||||||
**Example** (8 terraform-aws-* modules):
|
|
||||||
|
|
||||||
```
|
|
||||||
Best: CRX (MDL 1876)
|
|
||||||
Grammar: null_resource?.s3_bucket_lifecycle_configuration?.vpc?.launch_configuration?....
|
|
||||||
Why: CRX matches 8/8 sequences. iDRegEx returned ∅ (no common core across modules).
|
|
||||||
```
|
|
||||||
|
|
||||||
### Portainer Templates
|
|
||||||
|
|
||||||
```python
|
|
||||||
import json, urllib.request
|
|
||||||
from bex.ensemble import infer_ensemble
|
|
||||||
|
|
||||||
url = "https://raw.githubusercontent.com/portainer/templates/master/templates.json"
|
|
||||||
with urllib.request.urlopen(url) as resp:
|
|
||||||
data = json.loads(resp.read())
|
|
||||||
templates = data if isinstance(data, list) else data.get('templates', [])
|
|
||||||
seqs = [list(t.keys()) for t in templates]
|
|
||||||
|
|
||||||
result = infer_ensemble(seqs)
|
|
||||||
print(f"Best: {result['best']['algorithm']} (MDL {result['best']['mdl_score']})")
|
|
||||||
print(f"Grammar: {result['best']['grammar']}")
|
|
||||||
```
|
|
||||||
|
|
||||||
### GitHub Actions
|
|
||||||
|
|
||||||
```python
|
|
||||||
import yaml
|
|
||||||
from bex.ensemble import infer_ensemble
|
|
||||||
|
|
||||||
seqs = []
|
|
||||||
for wf_file in Path('.github/workflows/').glob('*.yml'):
|
|
||||||
data = yaml.safe_load(wf_file.read_text())
|
|
||||||
for job in data.get('jobs', {}).values():
|
|
||||||
if 'steps' not in job:
|
|
||||||
continue
|
|
||||||
seq = [s.get('uses', 'run:' + s.get('run', '').split()[0])
|
|
||||||
for s in job['steps'] if 'uses' in s or 'run' in s]
|
|
||||||
if seq:
|
|
||||||
seqs.append(seq)
|
|
||||||
|
|
||||||
result = infer_ensemble(seqs)
|
|
||||||
```
|
|
||||||
|
|
||||||
## How MDL scoring works
|
## How MDL scoring works
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
|
||||||
24
SHOWCASE.md
24
SHOWCASE.md
|
|
@ -46,27 +46,7 @@ vocabulary (19 kinds). Which one an agent uses depends on the task:
|
||||||
- Bootstrapping a new cluster: iDRegEx — what you can't skip
|
- Bootstrapping a new cluster: iDRegEx — what you can't skip
|
||||||
- Writing a complete chart: CRX — everything you might need
|
- Writing a complete chart: CRX — everything you might need
|
||||||
|
|
||||||
## 3. Portainer templates (47 templates)
|
## 3. GitHub Actions (cross-project Go lint, 6 jobs)
|
||||||
|
|
||||||
Official Portainer app templates from portainer/templates:
|
|
||||||
|
|
||||||
```
|
|
||||||
Best: CRX | MDL 1282
|
|
||||||
Grammar: (type+title)+.
|
|
||||||
(categories+description+image+logo+name+note+platform)+.
|
|
||||||
repository?.(env+ports+privileged+volumes)+?.command?
|
|
||||||
```
|
|
||||||
|
|
||||||
Field ordering convention: identity (`type`, `title`) → metadata
|
|
||||||
(`description`, `categories`, `platform`, `logo`) → source
|
|
||||||
(`image`, `repository`) → deployment (`ports`, `volumes`, `env`) →
|
|
||||||
entrypoint (`command`). 21 unique orderings, one grammar.
|
|
||||||
|
|
||||||
**Why it helps an LLM:** Writing a Portainer template needs the right
|
|
||||||
field order. The grammar tells you: identity first, then metadata,
|
|
||||||
then source, then deployment config.
|
|
||||||
|
|
||||||
## 4. GitHub Actions (cross-project Go lint, 6 jobs)
|
|
||||||
|
|
||||||
Lint jobs from prometheus, goreleaser, cosign, sigstore:
|
Lint jobs from prometheus, goreleaser, cosign, sigstore:
|
||||||
|
|
||||||
|
|
@ -82,7 +62,7 @@ Only the biggest add megalinter.
|
||||||
**Why it helps an LLM:** Starting a new Go project? The lint workflow
|
**Why it helps an LLM:** Starting a new Go project? The lint workflow
|
||||||
has a near-universal pattern.
|
has a near-universal pattern.
|
||||||
|
|
||||||
## 5. Terraform (8 AWS modules)
|
## 4. Terraform (8 AWS modules)
|
||||||
|
|
||||||
Terraform modules by hashicorp and terraform-aws-modules:
|
Terraform modules by hashicorp and terraform-aws-modules:
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -191,7 +191,7 @@ depending on the data:
|
||||||
|---------|--------|-----|
|
|---------|--------|-----|
|
||||||
| Ansible galaxy (15 roles) | CRX | iDRegEx returned ∅ (too diverse) |
|
| Ansible galaxy (15 roles) | CRX | iDRegEx returned ∅ (too diverse) |
|
||||||
| Helm prom-stack (6 configs) | **iDRegEx** | Finds minimal core across all configs |
|
| Helm prom-stack (6 configs) | **iDRegEx** | Finds minimal core across all configs |
|
||||||
| Portainer templates (47) | CRX | iDRegEx returned ∅ (no single common field) |
|
| Terraform modules (8) | CRX | iDRegEx returned ∅ (no common core across domains) |
|
||||||
| Terraform modules (8) | CRX | Every resource type optional across domains |
|
| Terraform modules (8) | CRX | Every resource type optional across domains |
|
||||||
| GitHub Actions Go lint (6) | CRX | Tight pattern, all match |
|
| GitHub Actions Go lint (6) | CRX | Tight pattern, all match |
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue