feat: kOREInference — Algorithm 4 iDRegEx with MDL scoring + ensemble integration #1

Merged
tobi merged 17 commits from feature/kore-inference into main 2026-07-01 14:08:19 +00:00
Showing only changes of commit 0be1a7fd79 - Show all commits

View file

@ -3,8 +3,14 @@
<p align="left"> <p align="left">
<img src="dervish-logo.png" alt="Dervish" width="180"> <img src="dervish-logo.png" alt="Dervish" width="180">
</p> </p>
<p align="left"> <p align="center">
<img src="https://img.shields.io/badge/license-MIT-blue" alt="License">
<img src="https://img.shields.io/badge/python-3.10%2B-blue" alt="Python 3.10+">
<img src="https://ci.corentic.eu/api/badges/7/status.svg" alt="CI Pipeline Status"> <img src="https://ci.corentic.eu/api/badges/7/status.svg" alt="CI Pipeline Status">
<br>
<a href="SHOWCASE.md">Showcase</a> ·
<a href="#quick-start">Usage</a> ·
<a href="#papers">Papers</a>
</p> </p>
**Dervish** infers **regular expression grammars** from example sequences using the BEX family of algorithms. Given a set of example sequences (strings over some alphabet), it learns a compact regular expression that captures the general pattern. **Dervish** infers **regular expression grammars** from example sequences using the BEX family of algorithms. Given a set of example sequences (strings over some alphabet), it learns a compact regular expression that captures the general pattern.
@ -53,7 +59,7 @@ The primary interface is a **Model Context Protocol (MCP)** server. Connect any
An LLM agent uses the MCP to discover an unwritten convention from existing examples — compressing hundreds of files into a single ~60-token rule: An LLM agent uses the MCP to discover an unwritten convention from existing examples — compressing hundreds of files into a single ~60-token rule:
``` ```text
User: Generate a new Ansible role for installing PostgreSQL. User: Generate a new Ansible role for installing PostgreSQL.
Agent: Let me check what pattern the existing community roles follow. Agent: Let me check what pattern the existing community roles follow.
@ -164,7 +170,7 @@ Across all public benchmarks, Dervish delivers **4083× compression**. The gr
## How MDL scoring works ## How MDL scoring works
``` ```text
MDL = model_cost + data_cost MDL = model_cost + data_cost
``` ```