Question 1

How is this different from asking a model to "make it better"?

Accepted Answer

A rewrite gives you a new version with no account of what changed or whether it is actually better. Rubrkit grades the original against a rubric, names the weak dimensions with evidence, and runs an eval case the old version fails and the new one passes. You see the score delta and the diff, not just a different paragraph.

Question 2

What does "proof" actually mean here?

Accepted Answer

Three things together: a structural finding tied to a rubric dimension, a behaviour change shown as a failing-then-passing eval case, and a reproducible score delta. Improvement is only "clear" when it is diagnosed, demonstrated as a behaviour change, and repeatable.

Question 3

Which rubric dimensions do you score?

Accepted Answer

Objective clarity, context sufficiency, input handling, output specification, evaluation criteria, and boundary and failure handling — weighted by the artifact type, since a prompt and an agent spec fail in different ways.

Question 4

Can I see why a dimension scored low?

Accepted Answer

Yes. Every dimension links to the specific finding and the line it came from, so the score is auditable rather than a black-box number you have to trust.

Question 5

Are the numbers on this page real?

Accepted Answer

The before/after specimens shown here are curated examples, labelled as such. When you grade your own instruction, the score, findings, diff, and eval come from your run.

Question 6

What can I grade?

Accepted Answer

Prompts, commands, skills, agent specs, workflows, and rubr_flow procedures. Rubrkit detects the artifact type and applies the rubric that matches how it should behave.

Question 7

How long does a grade take?

Accepted Answer

The structural pass returns in seconds. An eval run depends on the case being tested; it is built to be reproducible rather than instant, because the point is a result that holds up, not a fast guess.

Stop guessing whether your prompt got better.

What a rewrite cannot give you.

Deterministic findings, not opinions.

Structural gaps

Scored dimensions

A behaviour change

Reproducible and versioned

Answers before you start.

Know which instructions are ready to run.

Follow the review loop as it ships.