Blog
Notes from the review loop.
How we think about grading AI instructions against a rubric, writing rewrites you can test, and versioning prompts, agents, and skills like the production code they are.
June 18, 2026 · 4 min readJune 12, 2026 · 5 min readJune 5, 2026 · 4 min read
Stop grading prompts on vibes
A prompt that "feels good" is not a prompt you can trust. Here is why we grade instructions against a rubric instead of a gut feeling.
Rubrics
Methodology
Anatomy of a testable rewrite
A weak instruction, the rubric dimensions it fails, the rewrite that fixes them, and the eval that proves the rewrite holds.
Rewrites
Evals
Version your prompts like code
Prompts, agents, and skills are production instructions. They deserve history, provenance, and a single source of truth — not a paste buried in a chat log.
Registry
Workflow