Compare commits

..

1 commit

Author SHA1 Message Date
tobjend
2a7111b7ff fix: regenerate chart with Comic Neue + path.sketch xkcd style
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
2026-07-01 16:13:39 +02:00

View file

@ -17,17 +17,16 @@
**Dervish** infers **regular expression grammars** from example sequences using the BEX family of algorithms. Given a set of example sequences (strings over some alphabet), it learns a compact regular expression that captures the general pattern. **Dervish** infers **regular expression grammars** from example sequences using the BEX family of algorithms. Given a set of example sequences (strings over some alphabet), it learns a compact regular expression that captures the general pattern.
Every codebase has unwritten conventions like the order tasks appear in Ansible roles, the resources a Helm chart always creates, the steps every CI pipeline runs. Nobody writes these down. They emerge from copying and converging. Every codebase has unwritten conventions the order tasks appear in Ansible roles, the resources a Helm chart always creates, the steps every CI pipeline runs. Nobody writes these down. They emerge from copying and converging.
When an LLM agent needs to follow these conventions, it usually has two bad options: When an LLM agent needs to follow these conventions, it usually has two bad options:
1. **Stuff every existing file into context** - You'll hit the context window by the third example. 1. **Stuff every existing file into context** — 15 Ansible roles = 5,000 tokens. You'll hit the context window by the third example.
2. **Guess from one or two examples** - the LLM infers a pattern and often gets it wrong. 2. **Guess from one or two examples** the LLM infers a pattern and often gets it wrong.
Dervish replaces both with a **one-call MCP tool**: pass your sequences, get back a ~60-token grammar. Dervish replaces both with a **one-call MCP tool**: pass your sequences, get back a ~60-token grammar. A rule you can trust, at a fraction of the cost.
By leveraging **Minimum Description Length (MDL) scoring**, Dervish treats the grammar discovery problem as an optimal compression task. the resulting rule is optimized to consume as few tokens as possible without losing the pattern.
**Without Dervish:** token cost scales linearly with examples. **With Dervish:** one compact grammar describes them all In a ~60200 token rule instead of thousands of tokens of raw examples. Try it out and you too will say: **Without Dervish:** token cost scales linearly with examples. **With Dervish:** one compact grammar describes them all a ~60200 token rule instead of thousands of tokens of raw examples. Try it out and you too will say:
<p align="center"><img src="dervish.gif" alt="Dervish animation" width="65%"></p> <p align="center"><img src="dervish.gif" alt="Dervish animation" width="65%"></p>
@ -60,7 +59,7 @@ The primary interface is a **Model Context Protocol (MCP)** server. Connect any
### Agent workflow ### Agent workflow
An LLM agent uses the MCP to discover an schema from existing examples, thereby compressing hundreds of files into a single ~60-token rule: An LLM agent uses the MCP to discover an unwritten convention from existing examples — compressing hundreds of files into a single ~60-token rule:
```text ```text
User: Generate a new Ansible role for installing PostgreSQL. User: Generate a new Ansible role for installing PostgreSQL.
@ -84,12 +83,12 @@ Agent: Let me check what pattern the existing community roles follow.
I'll generate the new role following this structure. I'll generate the new role following this structure.
``` ```
**Without Dervish:** the agent either has to read all 15 role files (5,000+ tokens per role), or guesses the pattern from 12 examples and often gets it wrong. **Without Dervish:** the agent stuffs 15 role files into context (5,000+ tokens per role = beyond any context window), or guesses the pattern from 12 examples and often gets it wrong.
**With Dervish:** one MCP call returns a ~60-token grammar known to match 15/15 existing roles. The agent follows it reliably. **With Dervish:** one MCP call returns a ~60-token grammar known to match 15/15 existing roles. The agent follows it reliably.
**Core+outlier mode:** When generating a new file, for example a new Ansible role, the agent can call with **Core+outlier mode:** When generating a new role, the agent can call with
`min_coverage=0.8` to learn the mainstream pattern while seeing which files `min_coverage=0.8` to learn the mainstream pattern while seeing which roles
deviate and why — useful when the user's case resembles an outlier deviate and why — useful when the user's case resembles an outlier
(e.g., a PHP app like phpmyadmin that needs raw `lineinfile`). (e.g., a PHP app like phpmyadmin that needs raw `lineinfile`).