diff --git a/AGENTS.md b/AGENTS.md index c19c1be..e36ab9e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -38,8 +38,12 @@ Two inference pipelines: python tests/test_bex.py ``` -## MCP Roadmap -- [ ] Standalone MCP server wrapping CRX + iDRegEx -- [ ] Tool: `infer_grammar(sequences, method="crx")` -- [ ] Tool: `ansible_role_grammar(roles_dir)` -- [ ] Tool: `yaml_to_sequences(yaml_path)` +## MCP Server + +The primary interface is an MCP server exposing a single tool: + +| Tool | Parameters | What it does | +|------|-----------|-------------| +| `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | Runs CRX + iDRegEx, picks best by MDL. `prefer='crx'` or `prefer='idregex'` skips ensemble. | + +Start it: `python /path/to/bex/mcp_server.py`, then connect any MCP client. diff --git a/README.md b/README.md index b66b1d0..bebdcf4 100644 --- a/README.md +++ b/README.md @@ -36,13 +36,12 @@ The primary interface is a **Model Context Protocol (MCP)** server. Connect any | Tool | Parameters | What it does | |------|-----------|-------------| -| `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | **Recommended.** Runs CRX + iDRegEx, picks best by MDL. Set `prefer='crx'` or `prefer='idregex'` to run one algorithm. | -| `infer_grammar` | `sequences`, `method`, `kmax`, `N` | Core single-algorithm inference. `method='crx'` (fast, deterministic) or `method='idregex'` (probabilistic EM). | +| `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | **The only tool you need.** Runs CRX + iDRegEx, picks best by MDL. Set `prefer='crx'` for full coverage or `prefer='idregex'` for minimal core — skips the ensemble and runs one algorithm. | **Parameters explained:** +- **`prefer`**: `'crx'` for full vocabulary (accepts all sequences), `'idregex'` for minimal common core (only what every example shares). Omit to let MDL pick the winner. - **`kmax`** (1–5): Context window for iDRegEx's k-testable automaton. Higher values capture longer-range dependencies but need more data and are slower. Default 2 works for most cases. - **`N`** (1–10): Baum-Welch EM iterations for iDRegEx training. More iterations = better convergence but slower. Default 3 is a good balance. -- **`prefer`**: Skip the CRX-vs-iDRegEx comparison. Use when you know which algorithm fits your data. ### Agent workflow diff --git a/bex/mcp_server.py b/bex/mcp_server.py index cf2d07f..df7b034 100644 --- a/bex/mcp_server.py +++ b/bex/mcp_server.py @@ -6,40 +6,11 @@ Run as: python -m bex.mcp_server from mcp.server.fastmcp import FastMCP -from .crx import CRX -from .idregex import idregex from .ensemble import infer_ensemble, _matches mcp = FastMCP("grammar-inference", log_level="ERROR") -@mcp.tool() -def infer_grammar( - sequences: list[list[str]], - method: str = "crx", - kmax: int = 2, - N: int = 3, -) -> str: - """Infer a grammar (regular expression) from example sequences. - - Args: - sequences: List of sequences, each a list of symbols (strings). - method: Algorithm to use — 'crx' (fast, deterministic) or 'idregex' (probabilistic, handles noise better). - kmax: Maximum k for k-ORE inference (iDRegEx only). - N: Number of EM iterations (iDRegEx only). - - Returns: - A regular expression string describing the inferred grammar. - """ - if method == "crx": - return CRX().infer(sequences) - elif method == "idregex": - result = idregex(sequences, kmax=kmax, N=N) - return result or "∅" - else: - raise ValueError(f"Unknown method: {method}. Use 'crx' or 'idregex'.") - - @mcp.tool() def infer_best_grammar( sequences: list[list[str]],