grammar-inference-engine/AGENTS.md
tobjend 7c00c6713d Initial commit: BEX-based grammar inference engine
- CRX: direct CHARE inference (Algorithm 7, TODS 2010)
- iDRegEx: k-ORE inference (Algorithm 4, arXiv 2010)
- RWR₀: SORE repair (Algorithm 6, TODS 2010)
- rwr²: k-ORE extraction (Algorithm 3, arXiv 2010)
- SOA, k-OA, iKoa, 2T-INF, Baum-Welch
- Ansible role grammar adapter
- Generic YAML key-path converter
- 28 tests, all passing
2026-07-01 08:01:16 +02:00

45 lines
1.4 KiB
Markdown

# Grammar Inference Engine — Agent Guide
## Overview
This repo implements the BEX family of algorithms for inferring regular expression grammars
from example sequences. Use it whenever you need to discover the pattern behind a set of
strings or structured sequences.
## Quick Start for Agents
```python
# Fast pattern inference
from bex.crx import CRX
g = CRX().infer([['a','b','c'], ['a','b'], ['a','c']]) # a.(b+c)?
# Probabilistic k-ORE inference (handles noise better)
from bex.idregex import idregex
g = idregex([['a','b','c'], ['a','b'], ['a','c']], kmax=2, N=3)
```
## Use Cases
1. **Ansible role patterns** — extract module sequences from tasks/main.yml, learn per-category grammars
2. **Log analysis** — find common patterns in event sequences
3. **API call patterns** — learn the typical order of API operations
4. **Configuration structure** — discover the schema behind YAML files
5. **Workflow mining** — extract the typical task flow from process logs
## Architecture
Two inference pipelines:
| Pipeline | When to use |
|----------|-------------|
| CRX (fast) | Many examples, need speed, CHAREs output |
| iDRegEx (robust) | Few/noisy examples, need probabilistic handling |
## Running Tests
```bash
python tests/test_bex.py
```
## MCP Roadmap
- [ ] Standalone MCP server wrapping CRX + iDRegEx
- [ ] Tool: `infer_grammar(sequences, method="crx")`
- [ ] Tool: `ansible_role_grammar(roles_dir)`
- [ ] Tool: `yaml_to_sequences(yaml_path)`