Special Judge / Output Protocol Workflow¶
This page is for batch tasks where exact stdout diffing is the wrong trust model.
- Trigger: special judge, custom validator, predicate-checked batch tasks, or many-valid-answers problems whose legality contract is already clear enough to implement a checker
- Inputs needed: one candidate solution plus a checker, validator, predicate script, or score-aware harness
- Output artifact: one reproducible local validation loop with at least one negative case and one accepted-looking positive case
- Stop condition: legality and output protocol are separated cleanly from the solving logic
- Pair with: Many-Valid-Answers / Validator-First Workflow, Local judge workflow, Anti-Hack Workflow
Use it when:
- the task accepts many valid outputs
- correctness depends on a predicate, not one reference output
- the judge computes legality or score through custom logic
- simple stdin/stdout replay does not tell you whether the output is actually valid
If the problem is still at the stage of "what exactly counts as a legal witness?", start one step earlier with Many-Valid-Answers / Validator-First Workflow.
Do not use this page if:
- the task is fully interactive; use Local judge workflow
- the task is an ordinary unique-answer batch task; use Stress testing workflow
- the real issue is adversarial batch hacking after the idea is known; use Anti-Hack Workflow
Which Workflow To Use Right Now¶
- ordinary unique-answer batch task -> Stress testing workflow
- interactive simulator or transcript problem -> Local judge workflow
- hack-sensitive constructive task -> start here, then pair with Anti-Hack Workflow
- many-valid-answers task and the legality contract is still fuzzy -> Many-Valid-Answers / Validator-First Workflow
- predicate-checked batch output or special judge with a clear contract -> this page
Core Goal¶
Separate these roles clearly:
solutionvalidator / checker / scorerinstance source
If the same binary or script is implicitly doing all three jobs, debugging becomes too emotional and too noisy.
Minimum Setup¶
For most special-judge tasks, keep this split:
sol.cppcheck.pyorcheck.cpp- optional
gen.cpp - optional
oracle.cppif a small exact model exists - one saved failing input or seed
The goal is not to mimic the official judge perfectly. The goal is to reproduce the legality contract locally.
Output Contract Card¶
Before you trust the solution, fill this card:
| Check | Your answer |
|---|---|
| exact output format | |
| output domain | |
| per-step legality rule | |
| final acceptance predicate | |
| if scored, what score means locally |
If any row is blank, you are still comparing vibes, not contracts.
Validator-First Loop¶
Default loop:
- write the legality checker first
- produce one tiny invalid output by hand
- make sure the checker rejects it
- run the candidate solution on a tiny valid case
- only then scale into generated or larger tests
This is the shortest route to avoiding "samples passed, but the custom judge hated it."
Negative Test Families¶
When you do not know what to attack first, try:
- correct format, wrong values
- legal values, illegal move sequence
- duplicated item in a purported set or permutation
- missing item in a purported covering or partition
- empty or one-element boundary
- one-step-away-from-legal witness
- same score-looking output that actually violates the predicate
Score-Aware Caution¶
If your local scorer is only a partial reconstruction of the official one:
- trust legality first
- treat score as directional only
- save the exact local assumption the scorer is making
Do not overfit to a fake local score model and call that correctness.
Batch Output Protocol Split¶
For many special-judge tasks, it helps to separate two checks:
protocol check- did the output shape and token count follow the statement?
meaning check- does the output actually satisfy the predicate?
This split catches a lot of "the answer idea is fine, but the serialized output is not."
Done When¶
- you can rerun one stable checker command without guessing
- the checker rejects at least one deliberate invalid output
- the candidate output passes the checker on at least one nontrivial case
- legality is no longer being inferred from a raw diff against one reference file
Good Pairings¶
- Local judge workflow
- Many-Valid-Answers / Validator-First Workflow
- Anti-Hack Workflow
- Codeforces Constructive / Validator-First Clinic 01
- Code Jam / Kick Start Analysis-First Clinic 01
References And Repo Anchors¶
Research snapshot refreshed on 2026-04-25.
Official / primary:
- Celebrate Google's Coding Competitions with a final round of programming fun
- google/coding-competitions-archive
- Single Round Matches (SRMs) - Topcoder Support
Repo anchors: