46 lines
1.4 KiB
Markdown
46 lines
1.4 KiB
Markdown
|
|
# Grammar Inference Engine — Agent Guide
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
This repo implements the BEX family of algorithms for inferring regular expression grammars
|
||
|
|
from example sequences. Use it whenever you need to discover the pattern behind a set of
|
||
|
|
strings or structured sequences.
|
||
|
|
|
||
|
|
## Quick Start for Agents
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Fast pattern inference
|
||
|
|
from bex.crx import CRX
|
||
|
|
g = CRX().infer([['a','b','c'], ['a','b'], ['a','c']]) # a.(b+c)?
|
||
|
|
|
||
|
|
# Probabilistic k-ORE inference (handles noise better)
|
||
|
|
from bex.idregex import idregex
|
||
|
|
g = idregex([['a','b','c'], ['a','b'], ['a','c']], kmax=2, N=3)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Use Cases
|
||
|
|
1. **Ansible role patterns** — extract module sequences from tasks/main.yml, learn per-category grammars
|
||
|
|
2. **Log analysis** — find common patterns in event sequences
|
||
|
|
3. **API call patterns** — learn the typical order of API operations
|
||
|
|
4. **Configuration structure** — discover the schema behind YAML files
|
||
|
|
5. **Workflow mining** — extract the typical task flow from process logs
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
Two inference pipelines:
|
||
|
|
|
||
|
|
| Pipeline | When to use |
|
||
|
|
|----------|-------------|
|
||
|
|
| CRX (fast) | Many examples, need speed, CHAREs output |
|
||
|
|
| iDRegEx (robust) | Few/noisy examples, need probabilistic handling |
|
||
|
|
|
||
|
|
## Running Tests
|
||
|
|
```bash
|
||
|
|
python tests/test_bex.py
|
||
|
|
```
|
||
|
|
|
||
|
|
## MCP Roadmap
|
||
|
|
- [ ] Standalone MCP server wrapping CRX + iDRegEx
|
||
|
|
- [ ] Tool: `infer_grammar(sequences, method="crx")`
|
||
|
|
- [ ] Tool: `ansible_role_grammar(roles_dir)`
|
||
|
|
- [ ] Tool: `yaml_to_sequences(yaml_path)`
|