Randomized Algorithms¶
Randomized algorithms use random choices as part of the solution method itself.
That is the key boundary of this page:
- using random tests to debug a deterministic solution is algorithm engineering
- using randomness inside the judge-facing algorithm is randomized algorithms
This page is theory-led. The real payoff is learning to say exactly:
- what is random
- what bad event can happen
- what type of guarantee you are actually claiming
- and how much probability of failure you are willing to spend
At A Glance¶
- Use this page when:
- a randomized choice makes balancing, partitioning, or sampling much simpler than a deterministic design
- the intended guarantee is expected time or high-probability correctness
- hashing, treaps, randomized pivoting, or repeated trials appear naturally
- you can validate or amplify the result if one trial is not trustworthy enough
- Prerequisites:
- Reasoning
- Algorithm Engineering
- String Hashing
- Strongest cues:
- the editorial says
in expectation,with high probability, orfailure probability - a deterministic version exists, but it is much heavier than the randomized one
- the clean abstraction is “pick a random priority / pivot / fingerprint / sample”
- a checker or verifier exists, so retries are realistic
- Strongest anti-cues:
- there is already a short deterministic solution with the same asymptotic cost
- the only randomness in sight is the local stress harness, not the actual solver
- you cannot describe a concrete bad event or probability bound
- the judge or statement effectively requires exact deterministic behavior and you have no verification layer
- the real win comes from exploiting a deterministic statement promise to build any valid witness; that belongs under Constructive
- Success after studying this page:
- you can distinguish Monte Carlo from Las Vegas cleanly
- you can write down the random experiment and the bad event before coding
- you can budget failure probability across many checks
- you can decide when randomization is principled and when it is just unfinished deterministic reasoning
Problem Model And Notation¶
For an input instance \(x\), a randomized algorithm can be viewed as:
where \(R\) is the random tape, seed, or stream of random choices.
Two outputs now matter:
- the returned answer
- the running time
Those lead to two different guarantee styles.
Monte Carlo¶
A Monte Carlo algorithm has a bounded running time, but it may return a wrong answer with small probability.
Typical shape:
Examples in contest practice:
- hashing used as a probabilistic equality oracle when the hash parameters are sampled from a random family
- randomized fingerprints for matching or set comparison
- repeated one-sided tests where a false positive or false negative is possible but rare
Las Vegas¶
A Las Vegas algorithm is always correct, but the running time is random.
Typical shape:
Examples in contest practice:
- randomized balancing in treaps
- a Monte Carlo filter followed by exact verification
- retry-until-verified schemes when the checker is cheap
The Three Objects You Must Name¶
On almost every randomized problem, name these explicitly:
random experiment: what is random?bad event: what would make the algorithm slow or wrong?budget: how small must that bad event be?
If you cannot name those three things, the solution is usually not ready.
From Deterministic Worst Case To The Right Random Model¶
The naive deterministic instinct is usually one of these:
- balance the structure for every possible insertion order
- choose a pivot that is always “good”
- compare huge objects exactly every time
- fight every adversarial corner case directly
That can be overkill.
The randomized idea is to make the hard global structure look average by injecting one carefully chosen source of randomness.
Typical simplifications:
- random priorities: a search tree behaves like a random BST instead of the worst insertion order
- random pivot / partitioning: expected subproblem sizes become easy to bound
- random fingerprint: comparing large objects becomes cheap, with a tiny error budget
- random sample / repeated trials: an existence witness is easier to find than to characterize deterministically, and membership can be checked exactly
The key contest move is not:
- “use randomness because I am stuck”
It is:
- “choose one random experiment whose bad event I can actually bound”
Core Invariant And Why It Works¶
Randomized correctness still needs structure. The structure is just different from a deterministic invariant.
Every serious randomized solution should answer these four questions.
Question 1: What Is Random?¶
Be concrete.
Good answers:
- each node gets an independent random priority
- the pivot is chosen uniformly from the current range
- the base or modulus is sampled once, then fixed
- each trial samples one candidate from a specified distribution
Bad answers:
- “some randomness helps”
- “I shuffled a bit”
- “the hash is random enough somehow”
If you cannot state the experiment precisely, you also cannot prove anything precise about it.
Question 2: What Bad Event Are You Bounding?¶
Common bad events:
- the tree becomes too tall
- the pivot split is too unbalanced too many times
- two unequal objects collide under the fingerprint
- every sample misses the witness
The page becomes much cleaner once the proof is written around the bad event instead of around vague luck.
Question 3: Is The Guarantee About Correctness Or Runtime?¶
This is where many contest explanations get muddy.
Two very different claims can both sound like “randomized algorithm”:
always correct, expected fastalways fast, usually correct
Write the type explicitly:
- treap: Las Vegas / expected-time data structure
- direct rolling-hash equality with runtime-randomized parameters: Monte Carlo
- hash match followed by exact verification: Las Vegas
Question 4: How Do You Shrink Failure Probability?¶
There are only a few standard moves:
- repeat independent trials when success is self-certifying or an exact verifier exists
- use two independent fingerprints instead of one
- verify every candidate before trusting it
- union-bound a per-check failure probability across all checks
Two formulas matter constantly.
If one trial fails with probability at most \(p\), then \(k\) independent trials all fail with probability at most:
If you perform \(m\) comparisons, each with failure probability at most \(p\), then a crude but useful union bound gives:
That is often enough for contest budgeting.
Complexity And Tradeoffs¶
The main tradeoff is usually:
- much simpler design and code versus
- only expected-time or high-probability guarantees
Typical patterns:
| Pattern | Typical guarantee | Why people use it | Main risk |
|---|---|---|---|
| randomized balancing (treap) | expected O(\log n) height / operations |
much simpler than many deterministic balanced trees | assuming “random” without defining the distribution |
| fingerprinting / hashing | tiny collision probability per comparison | very light and fast equality filter | treating hash equality as proof |
| randomized pivoting | expected good partition sizes | simpler than deterministic median machinery | forgetting adversarial fixed-seed behavior |
| retry with verifier | always correct, random running time | converts a Monte Carlo core into Las Vegas | verifier may be too expensive |
Important contest lesson:
- a randomized algorithm is only as trustworthy as its failure accounting
- if you cannot explain the error budget, the implementation is probably not contest-ready
Variant Chooser¶
| Situation | Best first idea | Why it fits | Where it fails |
|---|---|---|---|
| balanced BST operations are needed but a deterministic self-balancing tree is too heavy | randomized treap / randomized BST | random priorities make the tree shape behave like a random BST | weak if you need deterministic worst-case guarantees |
| many large equality checks dominate | Monte Carlo fingerprinting | one short hash replaces many long comparisons | weak if the problem demands exact proof and no verification is possible |
| candidate matches are few enough to verify exactly | hash or sample first, verify second | the filter stays fast and final correctness becomes exact | weak if the verifier is as expensive as full brute force everywhere |
| partition quality drives runtime | randomized pivot / selection | analysis moves from worst-case arrangement to expected split quality | weak if constants or repeated reseeding are mishandled |
| one trial has a small failure chance but retries are cheap | independent amplification | drives failure probability down exponentially | weak if trials are not really independent or the checker is weak |
Worked Examples¶
Example 1: Treap Priorities Turn Order Into Shape¶
Suppose keys are inserted in adversarial order:
A plain BST degenerates into a chain.
A treap stores:
- BST order on keys
- heap order on random priorities
Now the key order is fixed, but the tree shape is determined by the random priorities. Equivalently:
- if priorities are independent and ties are negligible or broken consistently, the final treap shape has the same distribution as a plain BST built from the keys in random order
That changes the proof target completely:
- you no longer prove balance for every insertion order
- you prove expected logarithmic depth under the random priority assignment
This is a Las Vegas pattern:
- search results are exact
- updates are exact
- the randomness only affects shape and therefore running time
Example 2: Hash Equality As Monte Carlo, Then As Las Vegas¶
Suppose you compare many substrings with one rolling hash whose base or other fingerprint parameters are sampled at runtime and then fixed for the whole run.
If you trust:
as immediate equality, then the algorithm is Monte Carlo:
- each comparison is very likely correct
- but collisions are not logically impossible
If instead you do:
- hash-filter candidate equalities quickly
- run an exact character-by-character check only on candidates that survive
then the final answer becomes Las Vegas:
- the hash still saves work
- the verifier removes false positives
This is one of the most important mindset shifts in contest randomization:
- the same probabilistic primitive can support either Monte Carlo or Las Vegas behavior depending on the final trust boundary
Example 3: Randomized Pivoting Avoids Deterministic Worst Cases¶
Suppose a partition-based algorithm becomes slow only when the pivot is repeatedly terrible.
If each step chooses the pivot uniformly from the current subarray, then the proof target changes:
- you no longer promise a perfect split every time
- you only need to show that sufficiently balanced splits happen often enough in expectation
This is the randomized-pivoting archetype:
- the algorithm is exact
- the randomness controls runtime, not correctness
- the benefit is that you stop fighting the worst fixed input order directly
Example 4: Amplification With Verifier-Backed Retries¶
Assume one randomized subroutine returns a candidate witness that passes an exact verifier with probability at least:
Then one run fails with probability at most:
Run it independently k times and accept the first candidate that passes the verifier.
The probability that all runs fail is at most:
For k = 5, that is already:
This is the standard amplification move behind many randomized contest arguments with exact checking:
- make the base trial simple
- then buy confidence by repeating it a few times
Algorithm And Pseudocode¶
There is no single randomized template, but there is a standard checklist.
RANDOMIZED_SOLVE(instance I):
define the random experiment R
define the bad event B
decide the guarantee type:
Monte Carlo -> bounded time, tiny error probability
Las Vegas -> always correct, random running time
if Monte Carlo:
budget the total failure probability
if many checks occur:
use a stronger fingerprint,
or amplification only when retries are verifier-backed or otherwise self-certifying
if a verifier exists:
consider filter + verify
this may convert a Monte Carlo core into a Las Vegas algorithm
implement one RNG policy:
one generator
one seed strategy
no ad hoc reseeding inside hot loops
document what is random, what can fail, and why the failure budget is acceptable
The central question is always:
- am I randomizing the hard part itself,
- or am I only using randomness as a debugging companion?
If it is the second one, the topic probably belongs under Algorithm Engineering, not here.
Implementation Notes¶
Seed Once, Not Everywhere¶
The common contest pattern is:
- create one RNG
- seed it once
- reuse it for all random choices
Repeatedly reseeding inside hot code often makes behavior less clear, not more random.
Separate Local Reproducibility From Judge Behavior¶
In LOCAL mode, reproducibility is often more valuable than perfect unpredictability.
Useful habits:
- print or log the seed when chasing a failing run
- keep one easy switch between fixed-seed local debugging and normal contest mode
That is a companion habit, not the proof of the algorithm.
A Weak Checker Destroys A Strong Randomized Idea¶
If you rely on filter + verify, the verifier becomes part of correctness.
Check that it is:
- exact
- cheaper than full brute force in the intended regime
- applied at the right trust boundary
Budget Failure Probability Across The Whole Algorithm¶
Do not only ask:
- what is the chance that one comparison fails?
Also ask:
- how many such comparisons happen?
- do I need a union bound?
- is one hash enough or should I use two?
This is where many “probably fine” solutions become contest-safe.
Random Testing Is A Companion, Not A Randomized Solution¶
Generating random test cases and diffing against a brute-force checker is excellent practice.
But that belongs with:
It does not mean the submitted algorithm itself is randomized.
Practice Archetypes¶
warm-up: probabilistic fingerprinting with a very clear trust boundary- Finding Borders
- why it belongs here: equal hashes are treated as a probabilistic equality filter, not as a structural proof
advanced companion: randomized structure inside a heavier algorithmic stack- VMWTREE
- why it belongs here: the implicit-treap layer is a genuine randomized data-structure component, but the note itself is still mainly a trees/HLD lesson
route hub: current internal navigation page for this area- Randomized algorithms ladder
- why it belongs here: it is the current route entry for the topic, not yet a full solved-note progression
References And Repo Anchors¶
Course- MIT OCW 6.856J: Randomized Algorithms
- Berkeley CS170: Randomization Notes
- Stanford CS174: Randomized Algorithms
Primary / classic- Seidel and Aragon: Randomized Search Trees (dblp bibliographic entry)
- Karp papers page: Efficient randomized pattern-matching algorithms
Reference- CP-Algorithms: Treap
- CP-Algorithms: String Hashing
- USACO Guide: Hashing
Repo anchors- Randomized algorithms ladder
- String Hashing
- Finding Borders
- VMWTREE
Implementation companions- Algorithm Engineering
- Stress Testing Workflow
- Local Judge Workflow
Related Topics¶
- Constructive: use it when the hard part is building one witness by exploiting a statement promise, not by randomization
- Algorithm Engineering: use it when randomness only appears in the test harness, benchmark plan, or reproducibility workflow
- String Hashing: use it when the real primitive is probabilistic substring equality or fingerprinting
- Complexity And Hardness: use it when the real question is what class of guarantees is even possible