BEX-based grammar inference engine: learn regular expression patterns from example sequences. Supports CHAREs (CRX), k-OREs (iDRegEx), and the full BEX pipeline (SOA→2T-INF→RWR₀→CRX / iKoa→BW→Disambiguate→Prune→rwr²).
- CRX: direct CHARE inference (Algorithm 7, TODS 2010) - iDRegEx: k-ORE inference (Algorithm 4, arXiv 2010) - RWR₀: SORE repair (Algorithm 6, TODS 2010) - rwr²: k-ORE extraction (Algorithm 3, arXiv 2010) - SOA, k-OA, iKoa, 2T-INF, Baum-Welch - Ansible role grammar adapter - Generic YAML key-path converter - 28 tests, all passing |
||
|---|---|---|
| bex | ||
| papers | ||
| tests | ||
| .gitignore | ||
| AGENTS.md | ||
| pyproject.toml | ||
| README.md | ||
| requirements.txt | ||
Grammar Inference Engine
Infer regular expression grammars from example sequences using the BEX family of algorithms. Given a set of example sequences (strings over some alphabet), the engine learns a compact regular expression that describes the general pattern.
Quick Start
pip install pyyaml
python -m bex
from bex.crx import CRX
seqs = [
['file', 'template', 'docker_image', 'command', 'set_fact', 'shell', 'wait_for'],
['file', 'template', 'docker_image', 'command', 'set_fact', 'shell'],
]
crx = CRX()
grammar = crx.infer(seqs)
print(grammar)
# file.template.docker_image.command.set_fact.shell.(wait_for)?
Algorithms
| Algorithm | What it learns | Paper | Use case |
|---|---|---|---|
| CRX | CHAREs (single-pass, deterministic) | TODS 2010 §6 | Fast inference from many sequences |
| iDRegEx | k-OREs (probabilistic, Baum-Welch) | arXiv 2010 | Handles noise, learns from few examples |
| RWR₀ | SOREs (iterative repair) | TODS 2010 §5.2 | Builds regex from a single automaton |
| rwr² | k-ORE from k-OA | arXiv 2010 | Post-processing for k-ORE extraction |
Pipeline 1: Direct CHARE Inference (fast)
Example sequences → CRX → CHAREs grammar
Pipeline 2: Probabilistic k-ORE Inference (robust)
Example sequences → Complete k-OA → Baum-Welch (EM)
→ Disambiguate → Prune → rwr² → k-ORE grammar
Architecture
bex/
├── crx.py # CRX: direct CHARE inference (Algorithm 7, TODS)
├── idregex.py # iDRegEx: k-ORE inference (Algorithm 4, arXiv)
├── rwr0.py # RWR₀: SORE repair (Algorithm 6, TODS)
├── rwrsq.py # rwr²: k-ORE extraction (Algorithm 3, arXiv)
├── soa.py # SOA: Symbolic Observation Automaton core
├── koa.py # k-OA: k-testable Observation Automaton
├── ikoa.py # iKoa: k-OA inference (Algorithm 1, arXiv)
├── twotinf.py # 2T-INF: 2-testable inference (Algorithm 1, TODS)
├── baum_welch.py # Baum-Welch EM training for k-OA
├── expr.py # Expression utilities (concat, disj, star, strip)
├── marking.py # State marking for determinism
├── yaml_to_seq.py # Generic YAML → key-path sequence converter
├── role_grammar.py # Ansible role → module-sequence extractor
└── ...
Domain: Ansible Role Grammar
The engine includes a domain adapter for Ansible roles. It extracts module names from tasks/main.yml files and learns per-category grammars:
python -c "
from bex.role_grammar import collect_all_role_sequences, learn_grammar
all_roles, by_category = collect_all_role_sequences('path/to/roles')
for cat, items in sorted(by_category.items()):
seqs = [s for _, s in items]
print(f'{cat}: {learn_grammar(seqs)}')
"
Example Output
── restore (2 roles) ──
Grammar: file.copy.unarchive+.command
── validate (5 roles) ──
Grammar: hosts?.shell?.(copy+debug+fail+set_fact+uri)+?
── configure (4 roles) ──
Grammar: (assert+debug+set_fact+uri)+?.include_role?
Grammar notation:
a.b—afollowed byb(concatenation)(a+b)— eitheraorb(disjunction)r?— zero or one (optional)r+— one or more (iteration)r+?— zero or more (varies across examples)
Domain: Generic YAML
The engine can convert any YAML file into key-path sequences for grammar inference:
from bex.yaml_to_seq import yaml_file_to_sequence, sequences_to_crx
grammar = sequences_to_crx(yaml_file_to_sequence('config.yml'))
Papers
- Bex et al. "Inferring Deterministic Regular Expressions from Positive Data" — TODS 2010
- Bex et al. "Inferring k-optimal REs from Positive Data" — arXiv:1004.2372
See papers/ for extracted text and the original references.
Tests
python -m pytest tests/
# or
python tests/test_bex.py
MCP Server
A Model Context Protocol server for grammar inference is planned. See AGENTS.md for the roadmap.
License
MIT