remove redundant infer_grammar tool; update docs to single-tool MCP

2026-07-01 13:15:19 +02:00 · 2026-07-01 13:15:19 +02:00 · b8cc40177c
commit b8cc40177c
parent ed495d3477
3 changed files with 11 additions and 37 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -38,8 +38,12 @@ Two inference pipelines:
 python tests/test_bex.py
 ```
-## MCP Roadmap
+## MCP Server
- [ ] Standalone MCP server wrapping CRX + iDRegEx
+
- [ ] Tool: `infer_grammar(sequences, method="crx")`
+The primary interface is an MCP server exposing a single tool:
- [ ] Tool: `ansible_role_grammar(roles_dir)`
+
- [ ] Tool: `yaml_to_sequences(yaml_path)`
+| Tool | Parameters | What it does |
 |------|-----------|-------------|
 | `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | Runs CRX + iDRegEx, picks best by MDL. `prefer='crx'` or `prefer='idregex'` skips ensemble. |
 Start it: `python /path/to/bex/mcp_server.py`, then connect any MCP client.
--- a/README.md
+++ b/README.md
@ -36,13 +36,12 @@ The primary interface is a **Model Context Protocol (MCP)** server. Connect any
 | Tool | Parameters | What it does |
 |------|-----------|-------------|
-| `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | **Recommended.** Runs CRX + iDRegEx, picks best by MDL. Set `prefer='crx'` or `prefer='idregex'` to run one algorithm. |
+| `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | **The only tool you need.** Runs CRX + iDRegEx, picks best by MDL. Set `prefer='crx'` for full coverage or `prefer='idregex'` for minimal core — skips the ensemble and runs one algorithm. |
 | `infer_grammar` | `sequences`, `method`, `kmax`, `N` | Core single-algorithm inference. `method='crx'` (fast, deterministic) or `method='idregex'` (probabilistic EM). |
 **Parameters explained:**
 - **`prefer`**: `'crx'` for full vocabulary (accepts all sequences), `'idregex'` for minimal common core (only what every example shares). Omit to let MDL pick the winner.
 - **`kmax`** (1–5): Context window for iDRegEx's k-testable automaton. Higher values capture longer-range dependencies but need more data and are slower. Default 2 works for most cases.
 - **`N`** (1–10): Baum-Welch EM iterations for iDRegEx training. More iterations = better convergence but slower. Default 3 is a good balance.
 - **`prefer`**: Skip the CRX-vs-iDRegEx comparison. Use when you know which algorithm fits your data.
 ### Agent workflow
--- a/bex/mcp_server.py
+++ b/bex/mcp_server.py
@ -6,40 +6,11 @@ Run as: python -m bex.mcp_server
 from mcp.server.fastmcp import FastMCP
 from .crx import CRX
 from .idregex import idregex
 from .ensemble import infer_ensemble, _matches
 mcp = FastMCP("grammar-inference", log_level="ERROR")
@mcp.tool()
 def infer_grammar(
    sequences: list[list[str]],
    method: str = "crx",
    kmax: int = 2,
    N: int = 3,
 ) -> str:
    """Infer a grammar (regular expression) from example sequences.
    Args:
        sequences: List of sequences, each a list of symbols (strings).
        method: Algorithm to use — 'crx' (fast, deterministic) or 'idregex' (probabilistic, handles noise better).
        kmax: Maximum k for k-ORE inference (iDRegEx only).
        N: Number of EM iterations (iDRegEx only).
    Returns:
        A regular expression string describing the inferred grammar.
    """
    if method == "crx":
        return CRX().infer(sequences)
    elif method == "idregex":
        result = idregex(sequences, kmax=kmax, N=N)
        return result or "∅"
    else:
        raise ValueError(f"Unknown method: {method}. Use 'crx' or 'idregex'.")
@mcp.tool()
 def infer_best_grammar(
    sequences: list[list[str]],