remove redundant infer_grammar tool; update docs to single-tool MCP

2026-07-01 13:15:19 +02:00 · 2026-07-01 13:15:19 +02:00 · b8cc40177c
commit b8cc40177c
parent ed495d3477
3 changed files with 11 additions and 37 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -38,8 +38,12 @@ Two inference pipelines:
 python tests/test_bex.py
 ```

-## MCP Roadmap
- [ ] Standalone MCP server wrapping CRX + iDRegEx
- [ ] Tool: `infer_grammar(sequences, method="crx")`
- [ ] Tool: `ansible_role_grammar(roles_dir)`
- [ ] Tool: `yaml_to_sequences(yaml_path)`
+## MCP Server
+
+The primary interface is an MCP server exposing a single tool:
+
+| Tool | Parameters | What it does |
+|------|-----------|-------------|
+| `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | Runs CRX + iDRegEx, picks best by MDL. `prefer='crx'` or `prefer='idregex'` skips ensemble. |
+
+Start it: `python /path/to/bex/mcp_server.py`, then connect any MCP client.
--- a/README.md
+++ b/README.md
@ -36,13 +36,12 @@ The primary interface is a **Model Context Protocol (MCP)** server. Connect any

 | Tool | Parameters | What it does |
 |------|-----------|-------------|
-| `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | **Recommended.** Runs CRX + iDRegEx, picks best by MDL. Set `prefer='crx'` or `prefer='idregex'` to run one algorithm. |
-| `infer_grammar` | `sequences`, `method`, `kmax`, `N` | Core single-algorithm inference. `method='crx'` (fast, deterministic) or `method='idregex'` (probabilistic EM). |
+| `infer_best_grammar` | `sequences`, `prefer`, `kmax`, `N` | **The only tool you need.** Runs CRX + iDRegEx, picks best by MDL. Set `prefer='crx'` for full coverage or `prefer='idregex'` for minimal core — skips the ensemble and runs one algorithm. |

 **Parameters explained:**
+- **`prefer`**: `'crx'` for full vocabulary (accepts all sequences), `'idregex'` for minimal common core (only what every example shares). Omit to let MDL pick the winner.
 - **`kmax`** (1–5): Context window for iDRegEx's k-testable automaton. Higher values capture longer-range dependencies but need more data and are slower. Default 2 works for most cases.
 - **`N`** (1–10): Baum-Welch EM iterations for iDRegEx training. More iterations = better convergence but slower. Default 3 is a good balance.
- **`prefer`**: Skip the CRX-vs-iDRegEx comparison. Use when you know which algorithm fits your data.

 ### Agent workflow

--- a/bex/mcp_server.py
+++ b/bex/mcp_server.py
@ -6,40 +6,11 @@ Run as: python -m bex.mcp_server

 from mcp.server.fastmcp import FastMCP

-from .crx import CRX
-from .idregex import idregex
 from .ensemble import infer_ensemble, _matches

 mcp = FastMCP("grammar-inference", log_level="ERROR")


-@mcp.tool()
-def infer_grammar(
-    sequences: list[list[str]],
-    method: str = "crx",
-    kmax: int = 2,
-    N: int = 3,
-) -> str:
-    """Infer a grammar (regular expression) from example sequences.
-
-    Args:
-        sequences: List of sequences, each a list of symbols (strings).
-        method: Algorithm to use — 'crx' (fast, deterministic) or 'idregex' (probabilistic, handles noise better).
-        kmax: Maximum k for k-ORE inference (iDRegEx only).
-        N: Number of EM iterations (iDRegEx only).
-
-    Returns:
-        A regular expression string describing the inferred grammar.
-    """
-    if method == "crx":
-        return CRX().infer(sequences)
-    elif method == "idregex":
-        result = idregex(sequences, kmax=kmax, N=N)
-        return result or "∅"
-    else:
-        raise ValueError(f"Unknown method: {method}. Use 'crx' or 'idregex'.")
-
-
@mcp.tool()
 def infer_best_grammar(
    sequences: list[list[str]],