1.6 KiB
1.6 KiB
Grammar Inference Engine — Agent Guide
Overview
This repo implements the BEX family of algorithms for inferring regular expression grammars from example sequences. Use it whenever you need to discover the pattern behind a set of strings or structured sequences.
Quick Start for Agents
# Fast pattern inference
from bex.crx import CRX
g = CRX().infer([['a','b','c'], ['a','b'], ['a','c']]) # a.(b+c)?
# Probabilistic k-ORE inference (handles noise better)
from bex.idregex import idregex
g = idregex([['a','b','c'], ['a','b'], ['a','c']], kmax=2, N=3)
Use Cases
- Ansible role patterns — extract module sequences from tasks/main.yml, learn per-category grammars
- Log analysis — find common patterns in event sequences
- API call patterns — learn the typical order of API operations
- Configuration structure — discover the schema behind YAML files
- Workflow mining — extract the typical task flow from process logs
Architecture
Two inference pipelines:
| Pipeline | When to use |
|---|---|
| CRX (fast) | Many examples, need speed, CHAREs output |
| iDRegEx (robust) | Few/noisy examples, need probabilistic handling |
Running Tests
python tests/test_bex.py
MCP Server
The primary interface is an MCP server exposing a single tool:
| Tool | Parameters | What it does |
|---|---|---|
infer_best_grammar |
sequences, prefer, kmax, N |
Runs CRX + iDRegEx, picks best by MDL. prefer='crx' or prefer='idregex' skips ensemble. |
Start it: python /path/to/bex/mcp_server.py, then connect any MCP client.