Local Judge Workflow¶
This page is for the class of problems where "run the binary on stdin" is not enough.
- Trigger: interactive tasks, simulator-style judges, or hidden-state protocol tasks
- Inputs needed: a solution binary plus either fixed tests, a simulator, or a saved transcript / seed
- Output artifact: one reproducible failing transcript or enough evidence that the local harness is trustworthy
- Stop condition: one clean harness loop that you can rerun without guessing
- Pair with: Foundations cheatsheet, Stress testing workflow, Special Judge / Output Protocol Workflow, Interactive Protocol Clinic 01
Use it when:
- the task is interactive
- you want a local harness that behaves more like the real judge
- there is a simulator, hidden state, or query budget to enforce
If you are still on ordinary batch tasks like Weird Algorithm or Missing Number, this page is too early. Stay with the normal stdin/stdout loop from the Foundations cheatsheet first.
If the harness exists but the leak is still flush / budget / transcript / stop-condition, pair this page with Interactive Protocol Clinic 01.
Which Workflow To Use Right Now¶
Choose this page only if simple stdin/stdout runs are not enough anymore:
- interactive protocol
- simulator or hidden-state harness
- query-budget enforcement
If the problem is still a normal batch task, go back to the Foundations cheatsheet or, if trust is the issue, to Stress testing workflow. If the task is batch but judged by a predicate or custom validator, go to Special Judge / Output Protocol Workflow.
Core Goal¶
Separate three roles clearly:
solutioninput generatoror fixed testsjudge / validator / simulator
When those roles are mixed together, debugging gets noisy very quickly.
Minimum Setup¶
For most local-judge tasks, keep this file split:
sol.cppjudge.pyorjudge.cpp- optional
gen.cpp - one saved transcript or seed when something fails
This keeps the submission candidate clean while giving you one place to inspect protocol behavior.
Interactive Baseline¶
For interactive problems, the default habits are:
- flush after every query or answer
- keep query count visible
- log judge-facing traffic to
stderr, notstdout - simulate the judge locally with a deterministic harness first
Minimal flush-safe pattern:
cout << "? " << x << ' ' << y << endl;
or:
cout << "? " << x << ' ' << y << '\n' << flush;
Never rely on accidental flushing.
Recommended File Split¶
Use this shape when the task is complicated enough:
sol.cppfor the actual submission candidatejudge.pyorjudge.cppfor the local simulator- optional
gen.cppfor random instance generation
That keeps the contest binary clean while still letting you test protocol behavior.
Batch Predicate Tasks¶
If the problem is not interactive but still needs a local validator for many valid answers or special judging, use Special Judge / Output Protocol Workflow. This page is narrower: it is mainly for protocol loops, simulators, and hidden-state harnesses.
Command-Line Loop¶
Typical local workflow:
c++ -std=c++20 -O2 -Wall -Wextra -pedantic sol.cpp -o sol
python3 judge.py < input.txt
or, when the judge launches the solution:
python3 judge.py ./sol
For randomized protocol checks:
for seed in $(seq 1 200); do
python3 judge.py ./sol "$seed" || break
done
The first goal is not “run many seeds.” The first goal is “make one seed fully reproducible.”
Transcript Discipline¶
When debugging a local judge, keep one transcript per failure:
- seed
- initial hidden instance
- all queries and replies
- reason the judge rejected the run
This is much more useful than only seeing “Wrong Answer”.
Query Budget Discipline¶
Interactive bugs are often not logic bugs, but process bugs:
- one extra query
- forgetting to stop after a final answer
- reading input after the protocol should have ended
- using buffered output without flushing
Make the harness enforce the query budget strictly.
Done When¶
- the harness can reproduce one failure from a saved seed or transcript
- the protocol roles are clearly separated
- the solution no longer depends on accidental flushing
- one rerun command is stable enough that you would trust it tomorrow