Many-Valid-Answers / Validator-First Workflow¶

This page is for batch tasks where the hardest part is not the core algorithm name, but the fact that many different outputs can be accepted.

Trigger: constructive tasks, witness-building tasks, score-aware batch outputs, or any problem where exact stdout diffing is the wrong trust model
Inputs needed: one candidate construction idea plus the statement-level legality contract
Output artifact: one explicit output contract, one smallest positive witness, one smallest negative witness, and one clear route to a checker
Stop condition: you are no longer trusting one pretty-looking sample output as proof of correctness
Pair with: Special Judge / Output Protocol Workflow, Anti-Hack Workflow, Stress testing workflow

Use it when:

the task accepts many legal answers
your current confidence comes from this output looks plausible
the statement defines correctness by a predicate, not by one exact target output
you need to decide what the validator should check before you even write it

Do not use this page if:

the task is fully interactive; use Local Judge Workflow
the answer is unique and plain diffing is enough; use Stress testing workflow
the main issue is already implementing a checker loop, not defining the legality contract; use Special Judge / Output Protocol Workflow

Which Workflow To Use Right Now¶

unique-answer batch task -> Stress testing workflow
interactive protocol task -> Local Judge Workflow
many valid outputs but the legality contract is still fuzzy -> this page
many valid outputs and the contract is already clear, but you need a checker / scorer loop -> Special Judge / Output Protocol Workflow

Separate four different jobs:

Many wrong answers come from jumping straight from 1 to confidence, while skipping 2 and 3.

Before coding the final construction, fill this card:

Check	Your answer
exact output shape
output domain
object being constructed
local legality rule
global target predicate
compare by exact output or predicate?
if scored, what local score means

If the last two rows are vague, you are not ready to trust one sample.

Default order:

This prevents the common failure mode:

When many outputs are legal, the first good candidate is not the fanciest one.

Prefer the witness with:

That makes both the solution and the validator easier to trust.

Attack these first:

If the witness survives none of these attacks, you do not trust the witness yet.

For many-valid-answers tasks, separate:

This split gives you a much cleaner route into a real checker.

This page comes earlier in the pipeline; Special Judge comes when the checker needs to become executable.

you can explain exactly why exact-output comparison is the wrong model
one legal witness and one illegal witness are both explicit
the checker you need is now obvious enough to implement
your confidence comes from contract plus validation, not from aesthetic similarity to samples

Research snapshot refreshed on 2026-04-25.

Official / primary:

Repo anchors: