Compare commits
8 commits
feature/ko
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 28f5f897d5 | |||
| d74b36e563 | |||
| 197a0a3c22 | |||
| 136ae08fe3 | |||
| ea8e2f1db7 | |||
| 16cbff61a8 | |||
|
|
b037098730 | ||
| d2d57bc431 |
1 changed files with 10 additions and 9 deletions
19
README.md
19
README.md
|
|
@ -17,16 +17,17 @@
|
||||||
|
|
||||||
**Dervish** infers **regular expression grammars** from example sequences using the BEX family of algorithms. Given a set of example sequences (strings over some alphabet), it learns a compact regular expression that captures the general pattern.
|
**Dervish** infers **regular expression grammars** from example sequences using the BEX family of algorithms. Given a set of example sequences (strings over some alphabet), it learns a compact regular expression that captures the general pattern.
|
||||||
|
|
||||||
Every codebase has unwritten conventions — the order tasks appear in Ansible roles, the resources a Helm chart always creates, the steps every CI pipeline runs. Nobody writes these down. They emerge from copying and converging.
|
Every codebase has unwritten conventions like the order tasks appear in Ansible roles, the resources a Helm chart always creates, the steps every CI pipeline runs. Nobody writes these down. They emerge from copying and converging.
|
||||||
|
|
||||||
When an LLM agent needs to follow these conventions, it usually has two bad options:
|
When an LLM agent needs to follow these conventions, it usually has two bad options:
|
||||||
|
|
||||||
1. **Stuff every existing file into context** — 15 Ansible roles = 5,000 tokens. You'll hit the context window by the third example.
|
1. **Stuff every existing file into context** - You'll hit the context window by the third example.
|
||||||
2. **Guess from one or two examples** — the LLM infers a pattern and often gets it wrong.
|
2. **Guess from one or two examples** - the LLM infers a pattern and often gets it wrong.
|
||||||
|
|
||||||
Dervish replaces both with a **one-call MCP tool**: pass your sequences, get back a ~60-token grammar. A rule you can trust, at a fraction of the cost.
|
Dervish replaces both with a **one-call MCP tool**: pass your sequences, get back a ~60-token grammar.
|
||||||
|
By leveraging **Minimum Description Length (MDL) scoring**, Dervish treats the grammar discovery problem as an optimal compression task. the resulting rule is optimized to consume as few tokens as possible without losing the pattern.
|
||||||
|
|
||||||
**Without Dervish:** token cost scales linearly with examples. **With Dervish:** one compact grammar describes them all — a ~60–200 token rule instead of thousands of tokens of raw examples. Try it out and you too will say:
|
**Without Dervish:** token cost scales linearly with examples. **With Dervish:** one compact grammar describes them all In a ~60–200 token rule instead of thousands of tokens of raw examples. Try it out and you too will say:
|
||||||
|
|
||||||
<p align="center"><img src="dervish.gif" alt="Dervish animation" width="65%"></p>
|
<p align="center"><img src="dervish.gif" alt="Dervish animation" width="65%"></p>
|
||||||
|
|
||||||
|
|
@ -59,7 +60,7 @@ The primary interface is a **Model Context Protocol (MCP)** server. Connect any
|
||||||
|
|
||||||
### Agent workflow
|
### Agent workflow
|
||||||
|
|
||||||
An LLM agent uses the MCP to discover an unwritten convention from existing examples — compressing hundreds of files into a single ~60-token rule:
|
An LLM agent uses the MCP to discover an schema from existing examples, thereby compressing hundreds of files into a single ~60-token rule:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
User: Generate a new Ansible role for installing PostgreSQL.
|
User: Generate a new Ansible role for installing PostgreSQL.
|
||||||
|
|
@ -83,12 +84,12 @@ Agent: Let me check what pattern the existing community roles follow.
|
||||||
I'll generate the new role following this structure.
|
I'll generate the new role following this structure.
|
||||||
```
|
```
|
||||||
|
|
||||||
**Without Dervish:** the agent stuffs 15 role files into context (5,000+ tokens per role = beyond any context window), or guesses the pattern from 1–2 examples and often gets it wrong.
|
**Without Dervish:** the agent either has to read all 15 role files (5,000+ tokens per role), or guesses the pattern from 1–2 examples and often gets it wrong.
|
||||||
|
|
||||||
**With Dervish:** one MCP call returns a ~60-token grammar known to match 15/15 existing roles. The agent follows it reliably.
|
**With Dervish:** one MCP call returns a ~60-token grammar known to match 15/15 existing roles. The agent follows it reliably.
|
||||||
|
|
||||||
**Core+outlier mode:** When generating a new role, the agent can call with
|
**Core+outlier mode:** When generating a new file, for example a new Ansible role, the agent can call with
|
||||||
`min_coverage=0.8` to learn the mainstream pattern while seeing which roles
|
`min_coverage=0.8` to learn the mainstream pattern while seeing which files
|
||||||
deviate and why — useful when the user's case resembles an outlier
|
deviate and why — useful when the user's case resembles an outlier
|
||||||
(e.g., a PHP app like phpmyadmin that needs raw `lineinfile`).
|
(e.g., a PHP app like phpmyadmin that needs raw `lineinfile`).
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue