remove redundant infer_grammar tool; update docs to single-tool MCP

This commit is contained in:
tobjend 2026-07-01 13:15:19 +02:00
parent ed495d3477
commit b8cc40177c
3 changed files with 11 additions and 37 deletions

View file

@ -38,8 +38,12 @@ Two inference pipelines:
python tests/test_bex.py python tests/test_bex.py
``` ```
## MCP Roadmap ## MCP Server
- [ ] Standalone MCP server wrapping CRX + iDRegEx
- [ ] Tool: `infer_grammar(sequences, method="crx")` The primary interface is an MCP server exposing a single tool:
- [ ] Tool: `ansible_role_grammar(roles_dir)`
- [ ] Tool: `yaml_to_sequences(yaml_path)` | Tool | Parameters | What it does |
|------|-----------|-------------|
| `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | Runs CRX + iDRegEx, picks best by MDL. `prefer='crx'` or `prefer='idregex'` skips ensemble. |
Start it: `python /path/to/bex/mcp_server.py`, then connect any MCP client.

View file

@ -36,13 +36,12 @@ The primary interface is a **Model Context Protocol (MCP)** server. Connect any
| Tool | Parameters | What it does | | Tool | Parameters | What it does |
|------|-----------|-------------| |------|-----------|-------------|
| `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | **Recommended.** Runs CRX + iDRegEx, picks best by MDL. Set `prefer='crx'` or `prefer='idregex'` to run one algorithm. | | `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | **The only tool you need.** Runs CRX + iDRegEx, picks best by MDL. Set `prefer='crx'` for full coverage or `prefer='idregex'` for minimal core — skips the ensemble and runs one algorithm. |
| `infer_grammar` | `sequences`, `method`, `kmax`, `N` | Core single-algorithm inference. `method='crx'` (fast, deterministic) or `method='idregex'` (probabilistic EM). |
**Parameters explained:** **Parameters explained:**
- **`prefer`**: `'crx'` for full vocabulary (accepts all sequences), `'idregex'` for minimal common core (only what every example shares). Omit to let MDL pick the winner.
- **`kmax`** (15): Context window for iDRegEx's k-testable automaton. Higher values capture longer-range dependencies but need more data and are slower. Default 2 works for most cases. - **`kmax`** (15): Context window for iDRegEx's k-testable automaton. Higher values capture longer-range dependencies but need more data and are slower. Default 2 works for most cases.
- **`N`** (110): Baum-Welch EM iterations for iDRegEx training. More iterations = better convergence but slower. Default 3 is a good balance. - **`N`** (110): Baum-Welch EM iterations for iDRegEx training. More iterations = better convergence but slower. Default 3 is a good balance.
- **`prefer`**: Skip the CRX-vs-iDRegEx comparison. Use when you know which algorithm fits your data.
### Agent workflow ### Agent workflow

View file

@ -6,40 +6,11 @@ Run as: python -m bex.mcp_server
from mcp.server.fastmcp import FastMCP from mcp.server.fastmcp import FastMCP
from .crx import CRX
from .idregex import idregex
from .ensemble import infer_ensemble, _matches from .ensemble import infer_ensemble, _matches
mcp = FastMCP("grammar-inference", log_level="ERROR") mcp = FastMCP("grammar-inference", log_level="ERROR")
@mcp.tool()
def infer_grammar(
sequences: list[list[str]],
method: str = "crx",
kmax: int = 2,
N: int = 3,
) -> str:
"""Infer a grammar (regular expression) from example sequences.
Args:
sequences: List of sequences, each a list of symbols (strings).
method: Algorithm to use 'crx' (fast, deterministic) or 'idregex' (probabilistic, handles noise better).
kmax: Maximum k for k-ORE inference (iDRegEx only).
N: Number of EM iterations (iDRegEx only).
Returns:
A regular expression string describing the inferred grammar.
"""
if method == "crx":
return CRX().infer(sequences)
elif method == "idregex":
result = idregex(sequences, kmax=kmax, N=N)
return result or ""
else:
raise ValueError(f"Unknown method: {method}. Use 'crx' or 'idregex'.")
@mcp.tool() @mcp.tool()
def infer_best_grammar( def infer_best_grammar(
sequences: list[list[str]], sequences: list[list[str]],