Skip to content

Many-Valid-Answers / Validator-First Workflow

This page is for batch tasks where the hardest part is not the core algorithm name, but the fact that many different outputs can be accepted.

  • Trigger: constructive tasks, witness-building tasks, score-aware batch outputs, or any problem where exact stdout diffing is the wrong trust model
  • Inputs needed: one candidate construction idea plus the statement-level legality contract
  • Output artifact: one explicit output contract, one smallest positive witness, one smallest negative witness, and one clear route to a checker
  • Stop condition: you are no longer trusting one pretty-looking sample output as proof of correctness
  • Pair with: Special Judge / Output Protocol Workflow, Anti-Hack Workflow, Stress testing workflow

Use it when:

  • the task accepts many legal answers
  • your current confidence comes from this output looks plausible
  • the statement defines correctness by a predicate, not by one exact target output
  • you need to decide what the validator should check before you even write it

Do not use this page if:

Which Workflow To Use Right Now

Core Goal

Separate four different jobs:

  1. construct a candidate witness
  2. state the legality contract
  3. validate the witness
  4. attack the witness

Many wrong answers come from jumping straight from 1 to confidence, while skipping 2 and 3.

Output Contract Card

Before coding the final construction, fill this card:

Check Your answer
exact output shape
output domain
object being constructed
local legality rule
global target predicate
compare by exact output or predicate?
if scored, what local score means

If the last two rows are vague, you are not ready to trust one sample.

Validator-First Loop

Default order:

  1. write the contract in words
  2. produce one tiny legal witness by hand
  3. produce one tiny illegal witness by hand
  4. describe the checker that would accept the first and reject the second
  5. only then scale into the full constructive idea

This prevents the common failure mode:

  • I built something complicated
  • samples passed
  • I never actually stated what legality meant

Smallest Witness Discipline

When many outputs are legal, the first good candidate is not the fanciest one.

Prefer the witness with:

  • the shortest proof
  • the fewest moving parts
  • the easiest checker
  • the clearest impossible-case condition

That makes both the solution and the validator easier to trust.

Negative Test Families

Attack these first:

  • duplicated item in a supposed set / permutation / covering
  • missing item in a supposed partition / assignment / pairing
  • one-step-away-from-legal move sequence
  • correct-looking score but illegal final structure
  • empty, one-element, or smallest nontrivial case
  • impossible case where the construction keeps printing something anyway

If the witness survives none of these attacks, you do not trust the witness yet.

Predicate Split

For many-valid-answers tasks, separate:

  1. shape
  2. token count, sizes, bounds, formatting
  3. local legality
  4. each move, edge, assignment, or token is individually legal
  5. global goal
  6. the final structure satisfies the statement
  7. score if any
  8. only after legality is already safe

This split gives you a much cleaner route into a real checker.

When To Reopen Special Judge Workflow

Switch to Special Judge / Output Protocol Workflow when:

  • the legality contract is already clear
  • you now need a runnable checker or predicate loop
  • you need to separate checker / generator / solution binaries
  • you need one stable local command that rejects a deliberate negative case

This page comes earlier in the pipeline; Special Judge comes when the checker needs to become executable.

Done When

  • you can explain exactly why exact-output comparison is the wrong model
  • one legal witness and one illegal witness are both explicit
  • the checker you need is now obvious enough to implement
  • your confidence comes from contract plus validation, not from aesthetic similarity to samples

Good Pairings

References And Repo Anchors

Research snapshot refreshed on 2026-04-25.

Official / primary:

Repo anchors: