A compact procedure format for instructions agents can follow.
rubr_flow turns loose prompts, commands, skills, agent specs, and workflows into bounded, testable procedures with visible inputs, rules, steps, outputs, and verification.
Specimen RBR-FLOW
TASK "Ship safer agent instructions"
INPUTS
artifact = prompt_or_agent_spec
RULES
preserve user intent
flag unsupported claims
FLOW
CALL tool "Rubrkit audit" WITH artifact -> audit
EDIT weak dimensions FROM audit -> revision
WRITE eval checks FROM audit, revision -> tests
OUTPUT
score: audit.score
revision
tests
VERIFY
PASS WHEN tests can judge the result
FAIL WHEN tests cannot judge the resultPseudocode for instructions, not a runtime.
The artifact is still the instruction. A person can read it, an agent can follow it, and Rubrkit can grade whether the work is bounded enough to reuse.
Readable by default
Uses boring keywords, indentation, labels, and named outputs instead of clever syntax.
Built for review
Separates facts, constraints, actions, outputs, and verification so weaknesses are easy to mark.
Pasteable into agents
No compiler or special runtime is required. The procedure is meant to be followed directly.
Every block has one job.
rubr_flow works because it makes the hidden control surface visible: what the agent knows, what it may do, how it moves, and how success is checked.
TASK
Name the objective in one sentence.
CONTEXT
Separate durable facts from the procedure.
INPUTS
Declare files, data, URLs, or assumptions the agent needs.
RULES
Make boundaries and preservation requirements visible.
FLOW
List the ordered work with labels, branches, and handoffs.
OUTPUT
Define the final artifact shape before the work starts.
VERIFY
Give the agent a pass/fail finish line.
Loose request in. Bounded procedure out.
The point is not to make instructions longer. The point is to expose the decisions an agent otherwise has to guess.
Onboarding improvement
Review our onboarding flow and fix anything confusing.
rubr_flow procedure
TASK "Improve onboarding completion"
CONTEXT
user is new to [PRODUCT]
primary action is [TARGET ACTION]
INPUTS
current_flow = app screens
analytics_notes = drop-off data
support_themes = user confusion reports
RULES
change only copy and step order
preserve required legal text
TOOLS
READ analytics_notes
READ support_themes
STATE
friction_notes = []
FLOW
REVIEW each screen -> friction_notes
RANK issues by user impact -> ranked_issues
EDIT the highest-impact issue -> changed_copy
ON missing_context
ASK user "Which onboarding detail is missing?" -> missing_detail
FAIL WHEN missing_detail is unavailable
OUTPUT
changed_copy: final text
rationale: why this improves completion
risk_notes: constraints preserved
VERIFY
PASS WHEN user can identify the next action in one pass
STOP WHEN verification passesCoding-agent repair loop
Fix the failing tests and clean up anything related.
rubr_flow procedure
TASK "Repair failing checkout tests"
INPUTS
failing_command = "npm test -- checkout"
changed_files = git diff
RULES
edit only checkout code and focused tests
preserve public API names
FLOW
RUN "npm test -- checkout" -> test_result
DECIDE failure_notes FROM test_result, changed_files
DECIDE root_cause FROM failure_notes, changed_files
EDIT minimal patch -> patch
RUN "npm test -- checkout" -> verification
OUTPUT
root_cause
changed_files
verification
VERIFY
PASS WHEN verification.status == "passed"
FAIL WHEN same failure repeats 3 timesResearch workflow
Research the market and write a useful summary.
rubr_flow procedure
TASK "Produce a sourced market brief"
INPUTS
topic = [MARKET]
audience = [DECISION MAKER]
RULES
cite every factual claim
separate evidence from recommendation
FLOW
SEARCH topic IN approved_sources -> sources
IF sources.empty
FAIL WHEN sources.empty
READ url sources -> notes
DECIDE confidence_level FROM source_quality, recency
WRITE brief -> draft
OUTPUT
summary
evidence_table
open_questions
recommendation
VERIFY
PASS WHEN every recommendation points to evidence
STOP WHEN confidence_level is statedMeasure whether the procedure can hold.
These are sample rubric statistics from the examples on this page, not aggregate customer performance claims.
64 -> 100
Sample rubric score
The loose onboarding request improves when inputs, tools, failure handling, output, and VERIFY are explicit.
10/10
Rubric dimensions covered
The stronger sample covers task, context, inputs, rules, state, flow, tools, output, verification, and failure handling.
0
Open-ended finish lines
A usable procedure ends with PASS, STOP, or FAIL conditions instead of asking the agent to decide when it is good enough.
3
Drift controls added
Bounded edits, named inputs, and pass/fail verification make the agent less likely to invent the next step.