Before and after examples.

Each example shows the same loop: identify weak dimensions, rewrite the instruction, then define a check that proves whether the rewrite worked.

Weak marketing prompt

38/100

Write a good launch post for my product.

No audience

No positioning

No success criteria

Improved version

Write a launch post for [TARGET AUDIENCE] introducing [PRODUCT]. Use a direct tone, explain the problem, show one concrete outcome, and end with one CTA.

Why it improved

The rewrite defines audience, structure, tone, and the desired action.

Sample eval check

Passes if a reader can identify the product, problem, outcome, and CTA in under 30 seconds.

Bad coding prompt

42/100

Fix this React bug and make the code better.

Undefined bug

No constraints

No verification step

Improved version

Find the cause of [BUG]. Keep changes scoped to [FILES]. Explain the root cause, patch the code, and run [TEST COMMAND].

Why it improved

The rewrite defines scope, expected output, and verification.

Sample eval check

Passes if the patch includes a root-cause note and the specified test result.

Messy agent spec

45/100

You are an agent that researches competitors and writes summaries.

No tool boundaries

No stop condition

No escalation behavior

Improved version

Research competitors using [TOOLS]. Summarize sources, confidence, and gaps. Stop after [LIMIT] sources or when evidence is sufficient. Escalate if data is stale.

Why it improved

The agent now has tool limits, stopping rules, and failure handling.

Sample eval check

Passes if the agent cites sources, stops predictably, and flags stale evidence.

Vague command

50/100

Summarize this document.

No output contract

No audience

No defaults

Improved version

Summarize [DOCUMENT] for [AUDIENCE] in five bullets, then list risks, decisions, and unanswered questions.

Why it improved

The command becomes repeatable because the output shape is explicit.

Sample eval check

Passes if the answer contains the four requested sections and no invented claims.

Reusable skill instruction

44/100

When asked, help me create a presentation.

Weak trigger

No procedure

No output contract

Improved version

Trigger when the user asks for slides. Gather audience, objective, length, and source material. Produce outline, slide copy, and visual direction.

Why it improved

The skill has a trigger, inputs, steps, and deliverables.

Sample eval check

Passes if missing inputs are requested before slide content is generated.

Multi-step AI workflow

41/100

Research a topic, write a report, and make it useful.

No sequence

No owners

No measurement

Improved version

Research [TOPIC], extract claims with sources, draft the report, review risks, and produce final recommendations with pass/fail acceptance criteria.

Why it improved

The workflow has a sequence, deliverables, and quality gate.

Sample eval check

Passes if each step produces an artifact that the next step consumes.