Commit graph

56 commits

Author SHA1 Message Date
tobjend
9f5bde22d5 Remove bugs section (implementation bugs, not paper bugs), remove Docker Compose (private data), add Portainer templates, fix geerlingguy claim precision
Blog post: remove 'The bugs we found' section (all 4 bugs were from our implementation, not the paper algorithms). Replace company data references in MCP section with Galaxy example. Update ensemble dynamics table with public datasets.

README: replace Docker Compose with Portainer templates in 'Why grammar inference?' table, Real-world Results, and Domain Adapters.

SHOWCASE: replace Docker Compose with Portainer templates.

All claims verified: no public documentation of geerlingguy module ordering convention exists.
2026-07-01 10:15:22 +02:00
tobjend
547376894c Update README and SHOWCASE with real-world dataset evaluations
README:
- Replace outdated company benchmarks with public showcases
- Add Algorithm Selection Guide
- Add 'When each algorithm wins' table
- Add 'Why grammar inference?' table with value prop for LLMs
- Add 'What doesn't work' section documenting failed approaches
- Update all domain adapter examples with public results
- Clean up outdated references (companyweb roles, hashistack terraform)

SHOWCASE:
- Add Helm (kube-prometheus-stack) with iDRegEx minimal core
- Add Docker Compose per-project patterns
- Add GitHub Actions cross-project Go lint pattern
- Add Terraform modules with vocabulary analysis
- Add 'What doesn't work' section
- Explain WHY each dataset helps an LLM
2026-07-01 10:04:10 +02:00
tobjend
0e2aec582b Grammar inference engine: CRX + iDRegEx ensemble with MDL scoring, MCP server, showcase, and blog post
- Ensemble inference (infer_ensemble) runs both CRX and iDRegEx, picks best by MDL
- CRX: CRX algorithm for wide coverage (accepts all sequences, large vocabulary)
- iDRegEx: iDRegEx for minimal core grammar (tightest common pattern)
- MDL scoring: fixed model_cost to count alphabet symbol occurrences, fixed dispatch order in _count_words_fast
- Fixed _match_tokens: rewritten as _match_possible with proper backtracking
- Fixed _parse_parts disjunction: children use _parse_flat_symbol to avoid dot-splitting
- MCP server: infer_best_grammar and infer_grammar tools
- Added prefer parameter (crx/idregex) to skip ensemble
- 28 passing tests
- SHOWCASE.md with Geerlingguy Galaxy demonstration
- blog_post.md with full technical deep-dive
2026-07-01 09:51:41 +02:00
tobjend
a1567bffbe Add bin/mcp-server wrapper script for robust path resolution
Avoids module shadowing when CWD contains another bex package.
2026-07-01 08:06:17 +02:00
tobjend
adc52c99ec Add MCP server: grammar inference via FastMCP
- bex/mcp_server.py: FastMCP server with 3 tools:
  * infer_grammar(sequences, method='crx'|'idregex')
  * infer_yaml_grammar(yaml_dir, pattern, method)
  * infer_ansible_role_grammar(roles_dir)
- pyproject.toml: add bex-mcp console_scripts entry point
2026-07-01 08:03:10 +02:00
tobjend
7c00c6713d Initial commit: BEX-based grammar inference engine
- CRX: direct CHARE inference (Algorithm 7, TODS 2010)
- iDRegEx: k-ORE inference (Algorithm 4, arXiv 2010)
- RWR₀: SORE repair (Algorithm 6, TODS 2010)
- rwr²: k-ORE extraction (Algorithm 3, arXiv 2010)
- SOA, k-OA, iKoa, 2T-INF, Baum-Welch
- Ansible role grammar adapter
- Generic YAML key-path converter
- 28 tests, all passing
2026-07-01 08:01:16 +02:00