DSU On Tree / Small-To-Large¶
This lane is about subtree queries on a rooted tree where each child subtree contributes a mergeable container:
- a set of colors
- a map of frequencies
- or another summary that can be merged into a bigger container
The key optimization is:
always merge the smaller container into the larger one
That is why the lane is often called:
small-to-large mergingDSU on tree- or
sack
This repo's exact first route is the container-merging version:
- one container per subtree
- merge child containers bottom-up
- swap so the bigger container stays alive
The heavier keep heavy child / clear light children sack pattern is a natural follow-up once the statistic is update-friendly and a single global frequency table is cleaner than one container per node.
At A Glance¶
- Use when: every node needs one subtree answer, each child contributes a mergeable container, and the merge cost dominates the tree traversal
- Core worldview: each subtree owns one container; keep the biggest child container and pour smaller ones into it
- Main tools: rooted tree, postorder traversal, container
swap, merge-small-into-large proof - Typical complexity: often
O(n log^2 n)with ordered maps/sets, or nearO(n log n)average with hash maps - Main trap: calling everything "DSU on tree" even when the easier route is plain tree DP, Euler tour + Fenwick, or Mo's Algorithm
Strong contest signals:
- "for every node, answer something about its subtree"
- the subtree answer is a distinct-count, frequency map, or mergeable statistic
- each child subtree can be summarized independently, but merging all children naively is too slow
- you see repeated
map / set / multisetmerges inside DFS
Strong anti-cues:
- the answer is just one scalar subtree DP state -> Tree DP
- subtree intervals plus one static array structure already solve it -> Euler Tour / Subtree Queries
- one active current range with symmetric add/remove is the real invariant -> Mo's Algorithm
- the timeline has edge additions and removals -> DSU Rollback / Offline Dynamic Connectivity
Prerequisites¶
Helpful neighboring topics:
Problem Model And Notation¶
We root the tree at node 1.
Each node u has:
- one parent
p - zero or more children
- a subtree
sub(u)
For every node, we want one subtree statistic such as:
- number of distinct colors in
sub(u) - max frequency inside
sub(u) - sum over keys that appear in
sub(u)
The exact first route stores one container:
for each subtree, then merges child containers upward.
From Naive Merging To Small-To-Large¶
Naive Idea¶
For each node u:
- start a fresh container containing
u - recursively solve every child
v - merge every element of
bag[v]intobag[u]
This is correct, but if we always merge child into parent without caring about size, one element may be recopied too many times.
The Small-To-Large Rule¶
Before merging two containers:
if size(a) < size(b), swap(a, b)
Then insert everything from b into a.
Why does this help?
If one item moves from a smaller container into a larger one, the size of its host container at least doubles.
So each item can only be moved:
times before it ends up inside a container of size at least n.
That is the whole amortization argument.
Why The Name Sounds Weird¶
This technique is called DSU on tree because it copies the same heuristic spirit as ordinary DSU:
- in DSU, union smaller component into larger one
- here, merge smaller subtree container into larger one
But the exact data structure is usually:
setmapunordered_map- or some custom bucket/frequency container
not a union-find forest.
Exact First Route In This Repo¶
The starter in this repo teaches:
- bottom-up rooted-tree processing
- one frequency map per current surviving subtree container
- merge-small-into-large
answer[u] = number of distinct keys in bag[u]
This is the cleanest first route for:
The exact starter is:
Worked Example: Distinct Colors¶
In Distinct Colors, every node has one color and we want:
The container for one subtree can be:
- a map
color -> count
Then:
- add
u's own color - merge every child map into the largest one
- after all child merges,
bag[u].size()is the number of distinct colors insub(u)
This is the canonical first rep because:
- the statistic is truly a merged subtree summary
- no path queries or online updates distract from the merging idea
The Heavier Sack Variant¶
There is a second common style:
- find heavy child
- keep the heavy child's contribution alive
- add light subtrees temporarily
- clear light contributions when needed
That route is often better when:
- updates to one global statistic are very cheap
- carrying one mutable frequency table is simpler than owning a container per node
This page treats that as a follow-up variant, not the first route.
Variant Chooser¶
Use Tree DP When¶
- every child only contributes one small fixed-size state
- there is no real container merge
Use Small-To-Large Merging When¶
- each subtree naturally owns a container
- the answer can be read directly from the merged container summary
- one container per subtree is still the clean mental model
Use Sack / Heavy-Child Keep When¶
- one global frequency structure plus subtree add/remove is cleaner
- the statistic has cheap incremental updates
- the container-per-node route becomes too memory- or constant-heavy
Use Euler Tour / Subtree Queries When¶
- one subtree is just one interval in DFS order
- and a Fenwick / segment tree / offline sweep already answers the query
Use Mo On Trees When¶
- you really need add/remove over an Euler-tour walk order
- and the maintained state is one active range, not one container per subtree
Implementation Notes¶
- root the tree first
- process nodes in postorder
- if a child bag is larger than the current bag, swap pointers
- merge the smaller bag into the larger bag
- delete or discard merged-away child bags so one logical container survives per processed subtree
Common Failure Modes¶
- forgetting that
std::swap(a, b)is only cheap if the container's swap itself is cheap - using ordered maps when an ordinary frequency array or hash map would be the real intended route
- trying to force this lane onto path queries
- keeping all child bags alive after merge and accidentally blowing memory
- calling the technique "online" even though the whole tree is solved in one rooted offline pass
Sources¶
- reference: USACO Guide - Small-To-Large Merging
- reference: OI Wiki - DSU on Tree
- compare point: SOI Wiki - Smaller to Larger
- practice statement: CSES - Distinct Colors
Repo Routes¶
- starter: small-to-large-subtree-merge.cpp
- hot sheet: DSU On Tree hot sheet
- flagship note: Distinct Colors
- compare point: Tree DP
- compare point: Euler Tour / Subtree Queries
- compare point: Mo's Algorithm