DSU¶
Disjoint Set Union is the standard contest structure for maintaining a partition of elements under merges only.
That last phrase is the real heart of the topic:
- sets can merge
- sets do not split
- connectivity or grouping evolves monotonically
If you keep that model in mind, DSU stops being "that tiny union-find snippet" and becomes a very reliable modeling tool.
This page is designed to make four things automatic:
- see a component partition instead of a graph traversal problem
- define the root-representative invariant cleanly
- understand why path compression and union by size make the structure fast
- know exactly when plain DSU is enough and when you need a stronger variant
At A Glance¶
Reach for DSU when:
- elements start in separate groups and groups only merge
- the main question is "same component?" or "merge these components"
- edges are added over time, but not removed
- Kruskal, connectivity, or equivalence classes are nearby
Strong contest triggers:
- "are
uandvconnected after these additions?" - "merge groups and query component size"
- "if this edge joins two existing connected vertices, it makes a cycle"
- "sort edges and take only those joining different components"
- "the statement defines an equivalence relation or merge process"
Strong anti-cues:
- edges or group relationships must be deleted online
- you need path queries, shortest paths, or subtree aggregates
- the real problem is not component membership but order, range, or geometry
What success looks like after studying this page:
- you can implement
find,unite,same, andcomponent_sizefrom memory - you can explain why metadata should live at roots
- you know when DSU beats BFS/DFS and when it does not
- you can recognize parity / rollback / offline dynamic connectivity as variants, not plain DSU
Prerequisites¶
- Foundations
- Graphs helps, but DSU is not only a graph tool
Helpful neighboring topics:
- Minimum Spanning Tree
- BFS And DFS
- DSU Rollback / Offline Dynamic Connectivity
- DSU On Tree / Small-To-Large
Problem Model And Notation¶
We maintain a partition of elements:
into pairwise disjoint components.
The classical DSU interface is:
make_set(x)or initialization: start a singleton setfind(x): return the representative of the set containingxunite(a, b): merge the two sets containingaandb
In contest code, each component is usually stored as a rooted tree.
The key arrays are:
parent[x]: the immediate parent ofxin the forestsize[x]orrank[x]: a heuristic summary stored at roots
If x is a root, then usually:
Two elements belong to the same component if and only if:
From Brute Force To The Right Idea¶
Brute Force¶
Suppose we repeatedly merge groups and ask whether two vertices are in the same component.
The naive options are:
- rebuild connectivity by BFS/DFS after every merge
- explicitly store each set as a list and scan or relabel too much
These are often too slow when there are many operations.
The Structural Observation¶
The only thing we really need is:
- one stable name for each current component
That is what the representative root provides.
If every component has one root, then:
find(x)tells us the name ofx's componentunite(a, b)only needs to connect the two roots
So DSU turns set merging into forest maintenance.
Why Trees Are Enough¶
Each set is represented by a rooted tree.
Every node points, directly or indirectly, to the root of its component.
Then:
- the root is the representative
- following parent pointers finds the component name
- merging two components is just making one root point to the other
That is the whole data structure before optimizations.
Why The Optimizations Matter¶
Without care, DSU trees can become long chains, and then find(x) may cost O(n).
The two standard fixes are:
- path compression
- whenever we run
find(x), we make visited nodes point closer to the root - union by size/rank
- when merging two roots, attach the smaller/shallower tree under the larger/taller one
These two heuristics are what make DSU feel "constant-time" in practice.
A Tiny Forest Trace¶
Start with five singleton sets:
0 1 2 3 4
After:
unite(0, 1)
unite(1, 2)
unite(3, 4)
one valid DSU forest might look like:
0
├── 1
└── 2
3
└── 4
Now find(2) returns root 0.
If the tree had become longer, path compression would rewrite the nodes on the search path so that future finds jump to the root much faster. That is the concrete meaning of "compression": the partition stays the same, but the forest becomes flatter.
Core Invariant And Why It Works¶
Representative Invariant¶
At every moment, each node belongs to exactly one rooted tree, and the root of that tree is the representative of its component.
So:
This is the invariant everything else must preserve.
Why find Is Correct¶
If parent[x] = x, then x is the root of its component and is therefore the representative.
Otherwise, parent[x] lies in the same component as x, and recursively following parents must eventually reach the unique root.
That root is exactly the representative of the component containing x.
Why Path Compression Is Safe¶
Suppose find(x) discovers that the true representative is r.
If we then set:
or more generally compress the whole traversed path toward r, we do not change the partition into components.
We only shorten the internal tree shape while preserving:
- every node still points into the same component
- every node still reaches the same representative
So path compression is a pure performance optimization, not a semantic change.
Why Union By Size/Rank Is Safe¶
If a and b are roots of two different components, then making one root point to the other merges the two components correctly no matter which root becomes parent.
Union by size/rank is therefore also a pure performance choice.
Its purpose is to keep trees shallow:
- attach the smaller tree under the larger one
- or attach the lower-rank tree under the higher-rank one
That prevents the forest from degenerating badly.
Why Metadata Belongs At Roots¶
Suppose we want component size, sum, minimum, or any other aggregate over the whole set.
The clean rule is:
- store component metadata only at representatives
Then after a merge, only the new root needs an update.
If you try to keep such metadata synchronized at every node, the structure becomes much harder to reason about and easier to corrupt.
This root-only rule is one of the most reusable DSU habits.
Why The Complexity Is "Almost Constant"¶
With path compression plus union by size/rank, the amortized complexity per operation is:
where \alpha is the inverse Ackermann function.
For contest purposes, this behaves like a tiny constant.
The exact proof is deep and not necessary for day-to-day CP use. The practical intuition is enough:
- union by size/rank stops trees from growing tall too quickly
- path compression aggressively flattens whatever depth remains
So repeated finds become cheaper and cheaper.
There is also a very useful intermediate intuition:
- with union by size alone, a node's depth can increase only when its component is attached under a strictly larger one
- every time that happens, the size of the component above that node at least doubles
So even before path compression, the depth is already bounded by:
Then path compression flattens the parts of the forest that are actually touched by queries, which is what brings the amortized bound down to O(\alpha(n)).
Complexity And Tradeoffs¶
For m operations on n elements, plain optimized DSU supports:
find: amortizedO(\alpha(n))unite: amortizedO(\alpha(n))- memory:
O(n)
Practical tradeoffs:
| Tool | Best when | Time | Main tradeoff |
|---|---|---|---|
| DSU | merge-only connectivity / grouping | amortized near-constant | cannot support arbitrary splits |
| BFS / DFS | one-shot connectivity in a fixed graph | O(n + m) per traversal |
inefficient if repeated after every merge |
| rollback DSU | offline dynamic connectivity with undo | usually O(\log n)-ish per op depending on framework |
cannot use full path compression in the usual way |
| HLD / segment structures | path or subtree aggregates | stronger query power | far more machinery than DSU |
Rule of thumb:
- if components only merge, DSU should be in your shortlist immediately
- if components also split, plain DSU is usually the wrong abstraction
Variant Chooser¶
| Variant | Use it when | Extra state | Main pitfall |
|---|---|---|---|
| plain DSU | connectivity / component size / simple merges | parent + size/rank | trying to answer richer graph queries than connectivity |
| DSU with root metadata | each component tracks size/sum/min/max | metadata at roots | updating metadata on non-roots |
| parity / weighted DSU | relation to root matters, not just membership | parity or distance to representative | forgetting that the stored relation is part of find |
| rollback DSU | offline dynamic connectivity with undo | change stack | standard path compression is not directly compatible |
| DSU-on-tree / small-to-large | subtree query merging | multiset/map per subtree | this is a different technique, despite the name similarity |
The main lesson:
- many useful variants exist
- but plain DSU is still specifically a merge-only partition structure
Worked Examples¶
Example 1: Connectivity Under Edge Additions¶
Start with n isolated vertices, so every vertex is its own component.
For each new undirected edge (u, v):
- call
unite(u, v)
Then a query:
are u and v connected?
becomes:
find(u) == find(v)
This is the purest DSU application.
Example 2: Cycle Detection¶
In an undirected graph, before adding edge (u, v):
- if
find(u) == find(v), thenuandvare already connected - adding the edge creates a cycle
Otherwise:
- unite the two components
This is the core local test inside Kruskal and many incremental graph problems.
Example 3: Kruskal's Algorithm¶
This is the cleanest bridge from DSU to a major graph algorithm.
Sort edges by weight, then scan them in order.
For each edge (u, v, w):
- if
find(u) != find(v), take it andunite(u, v) - otherwise skip it, because it would create a cycle
DSU answers exactly the question Kruskal needs:
- do these endpoints currently lie in different components?
That is why Road Reparation is such a strong cross-topic anchor.
Example 4: Component Metadata¶
Suppose we want each component's size.
Initialize:
- every singleton has size
1
When merging roots a and b, after choosing the new root:
- update only the root's size
For example, under union by size:
parent[b] = a
size[a] += size[b]
Then:
component_size(x) = size[find(x)]
This pattern generalizes to many other root-only component aggregates.
Algorithm And Pseudocode¶
The repo starter implementation is:
Initialization¶
for each x:
parent[x] = x
size[x] = 1
Find With Path Compression¶
find(x):
if parent[x] == x:
return x
parent[x] = find(parent[x])
return parent[x]
Union By Size¶
unite(a, b):
a = find(a)
b = find(b)
if a == b:
return false
if size[a] < size[b]:
swap(a, b)
parent[b] = a
size[a] += size[b]
return true
This is the contest-default version for most plain DSU tasks.
Implementation Notes¶
1. Initialize Every Node As Its Own Parent¶
If you forget:
parent[x] = x
the whole representative invariant collapses immediately.
This sounds basic, but it is one of the most common DSU bugs.
2. Roots Carry The Meaning¶
When you write DSU code, mentally separate:
- node-level data
- component-level data
Component-level data belongs to roots.
After every merge, only the new root's metadata is authoritative.
3. find Should Usually Be The Only Way To Ask Component Questions¶
Do not compare raw parent[u] and parent[v].
Always compare:
find(u) == find(v)
because non-root parents can change under path compression.
4. Union By Size And Union By Rank Are Both Fine¶
Union by size is often easier to extend because component size is already useful metadata.
Union by rank is also standard and theoretically clean.
Pick one convention and keep it consistent.
5. Path Compression Makes The Forest Opaque¶
After heavy compression, the parent forest no longer has a meaningful original shape.
That is fine for connectivity, but it means:
- you should not try to interpret the DSU forest as the original graph structure
- if you need reversible changes, plain path compression may stop being the right choice
This is exactly why rollback DSU is treated as a separate variant.
6. DSU Is Not A General Dynamic Graph Structure¶
Plain DSU handles:
- merges
- connectivity
- root-centered metadata
It does not handle:
- deletions
- shortest paths
- path aggregates
- arbitrary subtree queries
Knowing this boundary is just as important as knowing the code.
Practice Archetypes¶
The most common DSU-flavored problems are:
- connectivity under additions
- cycle detection
- Kruskal / MST
- component size or component aggregate
- offline grouping by equivalence
More advanced smells:
- parity constraints
- rollback / offline dynamic connectivity
- matroid-style or forest-independence modeling as in C11XU
The strongest contest smell is:
- the partition evolves by merging, never by splitting
References And Repo Anchors¶
Research sweep refreshed on 2026-04-24.
Course:
Reference:
Practice:
Repo anchors:
- practice ladder: DSU ladder
- advanced note: C11XU
- rollback compare lane: DSU Rollback / Offline Dynamic Connectivity
- cross-topic note: Road Reparation
- starter template: dsu-basic.cpp
- notebook refresher: Data structures cheatsheet