diff --git a/README.md b/README.md
index 39f0a1c..99ad51f 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,19 @@
-# Dervish
+# Dervish MCP
 
 <p align="center"><img src="dervish.gif" alt="Dervish"></p>
 
-**Dervish** infers **regular expression grammars** from example sequences using the BEX family of algorithms. Given a set of example sequences (strings over some alphabet), it learns a compact regular expression that describes the general pattern.
+**Dervish** infers **regular expression grammars** from example sequences using the BEX family of algorithms. Given a set of example sequences (strings over some alphabet), it learns a compact regular expression that captures the general pattern.
+
+Every codebase has unwritten conventions — the order tasks appear in Ansible roles, the resources a Helm chart always creates, the steps every CI pipeline runs. Nobody writes these down. They emerge from copying and converging.
+
+When an LLM agent needs to follow these conventions, it usually has two bad options:
+
+1. **Stuff every existing file into context** — 15 Ansible roles = 5,000 tokens. You'll hit the context window by the third example.
+2. **Guess from one or two examples** — the LLM infers a pattern and often gets it wrong.
+
+Dervish replaces both with a **one-call MCP tool**: pass your sequences, get back a ~60-token grammar. A rule you can trust, at a fraction of the cost.
+
+**Without Dervish:** token cost scales linearly with examples. **With Dervish:** one compact grammar describes them all — a ~60–200 token rule instead of thousands of tokens of raw examples.
 
 ## MCP Server
 
@@ -196,6 +207,20 @@ The sweet spot: **multiple implementations of the same abstract task** (like "de
 | 2–3 sequences | iDRegEx | CRX overfits. iDRegEx handles noise better. |
 | Many sequences, tight pattern | CRX | Learns precise concatenation with optional suffixes. |
 
+## Token savings
+
+<p align="center">
+  <img src="chart_context_cost.png" alt="Context cost: raw examples vs Dervish grammar" width="75%">
+</p>
+
+Without Dervish, including N examples in context costs N × ~100 tokens. With Dervish, the grammar stays small and flat — ~60 tokens for a tight pattern, ~200 for diverse data.
+
+<p align="center">
+  <img src="chart_token_savings.png" alt="Token savings per dataset" width="75%">
+</p>
+
+Across all public benchmarks, Dervish delivers **40–83× compression**. The grammar is smaller than a single example file would be — and it represents the entire dataset.
+
 ## How MDL scoring works
 
 ```
@@ -217,8 +242,8 @@ The ensemble selects the grammar with the lowest total MDL.
 
 ## Papers
 
-- **Bex et al.** *"Inferring Deterministic Regular Expressions from Positive Data"* — TODS 2010
-- **Bex et al.** *"Inferring k-optimal REs from Positive Data"* — arXiv:1004.2372
+- **Bex et al.** *[Learning Deterministic Regular Expressions for the Web](https://doi.org/10.1145/1806907.1806911)* — TODS 2010
+- **Bex et al.** *[Simplifying XML Schema: Single-Type Approximations of Regular Expressions](https://arxiv.org/abs/1004.2372)* — arXiv:1004.2372
 
 ## Tests
 
diff --git a/blog_post.md b/blog_post.md
index a845d7a..d395dcc 100644
--- a/blog_post.md
+++ b/blog_post.md
@@ -253,9 +253,9 @@ the pattern they all share. The structural convention is in the data
 ## References
 
 - Bex, G. J., Gelade, W., Neven, F., & Vansummeren, S. (2010).
-  *Learning Deterministic Regular Expressions for the Web.* TODS 2010.
+  [*Learning Deterministic Regular Expressions for the Web.*](https://doi.org/10.1145/1806907.1806911) TODS 2010.
 - Bex, G. J., Gelade, W., Martens, W., & Neven, F. (2010).
-  *Simplifying XML Schema: Single-Type Approximations of Regular
-  Expressions.* arXiv:1004.2372.
+  [*Simplifying XML Schema: Single-Type Approximations of Regular
+  Expressions.*](https://arxiv.org/abs/1004.2372) arXiv:1004.2372.
 - Rissanen, J. (1978). *Modeling by shortest data description.*
   Automatica 14(5).
diff --git a/chart_context_cost.png b/chart_context_cost.png
new file mode 100644
index 0000000..4d21826
Binary files /dev/null and b/chart_context_cost.png differ
diff --git a/chart_token_savings.png b/chart_token_savings.png
new file mode 100644
index 0000000..ec7b081
Binary files /dev/null and b/chart_token_savings.png differ
diff --git a/make_charts.py b/make_charts.py
new file mode 100644
index 0000000..1553311
--- /dev/null
+++ b/make_charts.py
@@ -0,0 +1,71 @@
+import matplotlib.pyplot as plt
+import numpy as np
+
+plt.xkcd(scale=0.7, length=60, randomness=2)
+
+FIG_W = 8
+FIG_H = 5
+
+# ── Chart 1: Context cost vs examples ──
+fig1, ax1 = plt.subplots(figsize=(FIG_W, FIG_H))
+
+N = [1, 5, 15, 36]
+raw = [100, 500, 1500, 3600]  # ~100 tokens/example
+dervish = [40, 60, 60, 200]   # grammar grows only when diversity grows
+
+x = np.arange(len(N))
+w = 0.35
+
+bars1 = ax1.bar(x - w/2, raw, w, label='Raw examples', color='#e74c3c', alpha=0.85)
+bars2 = ax1.bar(x + w/2, dervish, w, label='Dervish grammar', color='#3498db', alpha=0.85)
+
+ax1.set_xticks(x)
+ax1.set_xticklabels([f'{n} examples' for n in N])
+ax1.set_ylabel('Tokens needed in context')
+ax1.set_title('Context cost: raw examples vs Dervish grammar')
+ax1.legend(frameon=False)
+
+for bar in bars1:
+    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 80,
+             f'{int(bar.get_height())}', ha='center', va='bottom', fontsize=9)
+for bar in bars2:
+    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 80,
+             f'{int(bar.get_height())}', ha='center', va='bottom', fontsize=9)
+
+ax1.set_ylim(0, 4500)
+fig1.tight_layout()
+fig1.savefig('chart_context_cost.png', dpi=200)
+plt.close(fig1)
+
+# ── Chart 2: Tokens — Without vs With Dervish (per dataset) ──
+fig2, ax2 = plt.subplots(figsize=(FIG_W, FIG_H))
+
+datasets = ['Ansible Galaxy\n(15 roles)', 'Helm\n(6 configs)', 'Go lint\n(6 jobs)']
+without = [5000, 3000, 900]
+with_derv = [60, 40, 30]
+ratios = [f'{int(w/d)}×' for w, d in zip(without, with_derv)]
+
+x2 = np.arange(len(datasets))
+w2 = 0.3
+
+bw = ax2.bar(x2 - w2/2, without, w2, label='Without Dervish', color='#e74c3c', alpha=0.85)
+bd = ax2.bar(x2 + w2/2, with_derv, w2, label='With Dervish', color='#3498db', alpha=0.85)
+
+ax2.set_xticks(x2)
+ax2.set_xticklabels(datasets)
+ax2.set_ylabel('Tokens')
+ax2.set_title('Token savings per dataset')
+ax2.legend(frameon=False)
+ax2.set_yscale('log')
+ax2.set_ylim(5, 30000)
+
+# Label compression ratios
+for i, (r, wbar, dbar) in enumerate(zip(ratios, bw, bd)):
+    ax2.text(x2[i], without[i] * 1.3, r, ha='center', va='bottom', fontsize=11, fontweight='bold',
+             bbox=dict(boxstyle='round,pad=0.2', facecolor='white', edgecolor='gray', alpha=0.8))
+
+fig2.tight_layout()
+fig2.savefig('chart_token_savings.png', dpi=200)
+plt.close(fig2)
+
+print("Charts saved: chart_context_cost.png, chart_token_savings.png")