grammar-inference-engine/AGENTS.md

1.6 KiB

Grammar Inference Engine — Agent Guide

Overview

This repo implements the BEX family of algorithms for inferring regular expression grammars from example sequences. Use it whenever you need to discover the pattern behind a set of strings or structured sequences.

Quick Start for Agents

# Fast pattern inference
from bex.crx import CRX
g = CRX().infer([['a','b','c'], ['a','b'], ['a','c']])  # a.(b+c)?

# Probabilistic k-ORE inference (handles noise better)
from bex.idregex import idregex
g = idregex([['a','b','c'], ['a','b'], ['a','c']], kmax=2, N=3)

Use Cases

  1. Ansible role patterns — extract module sequences from tasks/main.yml, learn per-category grammars
  2. Log analysis — find common patterns in event sequences
  3. API call patterns — learn the typical order of API operations
  4. Configuration structure — discover the schema behind YAML files
  5. Workflow mining — extract the typical task flow from process logs

Architecture

Two inference pipelines:

Pipeline When to use
CRX (fast) Many examples, need speed, CHAREs output
iDRegEx (robust) Few/noisy examples, need probabilistic handling

Running Tests

python tests/test_bex.py

MCP Server

The primary interface is an MCP server exposing a single tool:

Tool Parameters What it does
infer_best_grammar sequences, prefer, kmax, N Runs CRX + iDRegEx, picks best by MDL. prefer='crx' or prefer='idregex' skips ensemble.

Start it: python /path/to/bex/mcp_server.py, then connect any MCP client.