Grade, rewrite, and test your AI instructions.
Grade prompts, agents, skills, commands, and workflows against a clear rubric — then turn the weak ones into testable, version-tracked instructions.
Specimen RBR-082
Undefined success criteria
The model can answer, but it has no reliable way to know what good means.
Objective clarity
4/5
Output specification
3/5
Evaluation criteria
2/5
9-dimension rubric
Every audit scores objective, context, constraints, output, and verification.
Web, CLI & MCP
Run the same audits from the app, the rubrkit npm CLI, or any MCP client.
rubr_flow
An open, documented format for bounded, testable agent procedures.
Free at launch
Credit-metered so every run shows why it costs what it does.
An editorial loop for instructions that need to hold.
Paste an instruction
Add a prompt, command, skill, agent spec, workflow, or rubr_flow block.
Get a scored critique
Rubrkit grades clarity, context, constraints, output shape, and evaluation criteria.
Rewrite with checks
Get a stronger version plus simple evals that prove whether it works.
One grading system for the instructions AI teams actually reuse.
Rubrkit detects the artifact type and applies the rubric that matches how it should behave.
Generating more text is not the point. Rubrkit shows you what is weak, why it matters, how to fix it, and whether the fix can survive a real eval.
From vague request to testable instruction.
Write a professional launch email for my new course and make it engaging.
Write a launch email for [TARGET AUDIENCE] that drives [PRIMARY GOAL]. Use a clear subject line, three short sections, one CTA, and avoid claims that are not supported by [CONTEXT].
A native format for instructions an agent can actually follow.
For teams that need stricter control, rubr_flow turns loose intent into a compact procedure an agent can follow.
Intent without machinery
Review our onboarding flow and fix anything confusing.
rubr_flow procedure
TASK "Improve onboarding completion"
CONTEXT
user is new to [PRODUCT]
primary action is [TARGET ACTION]
INPUTS
current_flow = app screens
analytics_notes = drop-off data
support_themes = user confusion reports
RULES
change only copy and step order
preserve required legal text
ON missing_context
ASK user "Which onboarding detail is missing?" -> missing_detail
FLOW
REVIEW each screen -> friction_notes
RANK issues by user impact
EDIT the highest-impact issue
OUTPUT
changed_copy: final text
rationale: why this improves completion
risk_notes: constraints preserved
VERIFY
PASS WHEN user can identify the next action in one passThis is not another prompt type. It is the control surface for turning intent into bounded agent work.
Run an auditBring Rubrkit into your toolchain.
Sync graded artifacts into the repos where your agents run, or connect any MCP client to the same audits, bundles, and rubr_flow tools the web app uses.
CLI
Pull approved artifact bundles into local projects and place them where Codex, Claude, or generic agents expect their instructions.
# Pull approved artifacts into your project npx rubrkit pull # Place them where your agent expects them npx rubrkit pull all --agent claude --yes
MCP server
Point any MCP client at Rubrkit and call the same artifact bundle, audit, and rubr_flow tools, authenticated with your Rubrkit API key.
{
"mcpServers": {
"rubrkit": {
"url": "https://rubrkit.com/api/v1/mcp",
"headers": {
"Authorization": "Bearer <your-rubrkit-api-key>"
}
}
}
}Free now, built for serious review loops.
Pro and Team workflows are in preview while the Rubrkit review loop is being tuned.
Free
$0
For quick checks and first rewrites.
Limited audits
Basic score
Top 3 issues
One rewrite
Pro
Preview
For builders who reuse and test instructions.
Full rubric
Advanced rewrites
Eval kit generation
Version comparison
Saved library
Exports
Team
Preview
For shared standards and review workflows.
Team library
Shared rubric library
Admin controls
Private examples
Review reports
Questions before you grade your first instruction.
No. Every score points to a rubric dimension with evidence. You get a marked-up critique, a stronger version, and eval checks that test whether the fix actually holds — not a black-box rewrite.
Know which instructions are ready to run.
Grade a specimen, read the marks, ship the rewrite that survives an eval.
Run an auditFollow the review loop as it ships.
Notes on AI artifact testing, rubr_flow conversion, evals, and proof reports.