- Ensemble inference (infer_ensemble) runs both CRX and iDRegEx, picks best by MDL - CRX: CRX algorithm for wide coverage (accepts all sequences, large vocabulary) - iDRegEx: iDRegEx for minimal core grammar (tightest common pattern) - MDL scoring: fixed model_cost to count alphabet symbol occurrences, fixed dispatch order in _count_words_fast - Fixed _match_tokens: rewritten as _match_possible with proper backtracking - Fixed _parse_parts disjunction: children use _parse_flat_symbol to avoid dot-splitting - MCP server: infer_best_grammar and infer_grammar tools - Added prefer parameter (crx/idregex) to skip ensemble - 28 passing tests - SHOWCASE.md with Geerlingguy Galaxy demonstration - blog_post.md with full technical deep-dive
27 lines
821 B
Python
27 lines
821 B
Python
"""
|
|
bex — Paper-faithful implementation of BEX inference algorithms.
|
|
|
|
Papers:
|
|
- Bex et al. 2010 (TODS): Inference of Concise Regular Expressions and DTDs
|
|
- Bex et al. 2010 (arXiv 1004.2372): Learning Deterministic Regular Expressions
|
|
|
|
Algorithms implemented:
|
|
TODS 2010: 2T-INF, REWRITE, RWR, RWR², RWR₀, CRX
|
|
arXiv 2010: iKoa, Disambiguate, rwr², iDRegEx
|
|
"""
|
|
|
|
from .soa import SOA
|
|
from .twotinf import build_soa
|
|
from .rwr0 import rwr0
|
|
from .crx import CRX
|
|
from .ikoa import ikoa
|
|
from .rwrsq import rwr_sq
|
|
from .idregex import idregex
|
|
from .koa import KOA, build_complete_koa
|
|
from .expr import concat, disj, star, optional, alphabet, strip_k
|
|
from .marking import mark_koa
|
|
from .tokenizer import YAMLTokenizer
|
|
from .ensemble import infer_ensemble
|
|
from .template import generate_template
|
|
|
|
__version__ = "0.2.0"
|