Virtual Tree / Auxiliary Tree¶
This lane is for the first time a tree problem says:
the query only touches a small marked subset of nodes,
but the real structure still lives on the union of paths between them.
The right move is often:
- keep only the marked nodes
- add the LCAs needed to preserve ancestry
- build the compressed tree induced by those vertices
- run the real DP or counting logic on that much smaller object
That compressed tree is usually called:
virtual treeauxiliary tree- or
compressed tree on marked nodes
This topic sits naturally after Trees and LCA:
- Trees gives the rooted-tree worldview
- LCA gives the path/ancestor primitive
- this page teaches how to shrink one marked-subset query down to only the vertices that still matter
At A Glance¶
- Use when: one static rooted tree is queried many times, each query marks only
kvertices, and only the union of paths between those marked vertices matters - Core worldview: marked nodes alone are not enough; you must also keep the LCAs that glue their paths together
- Main tools: DFS preorder
tin, ancestor checks, LCA of consecutive marked nodes in DFS order, monotone stack build - Typical complexity: preprocessing
O(n log n); one query buildO(k log n)orO(k)after sorting/LCA costs are accounted for - Main trap: treating the virtual tree as a general replacement for every tree query instead of a marked-subset compression trick
Strong contest signals:
- "each query gives a subset of important/special/colored vertices"
- the answer only depends on paths connecting those marked vertices
- the original tree is large, but the total marked vertices over all queries is manageable
- after you root the tree, the hard part becomes "compress the relevant branch points, then run one small DP"
Strong anti-cues:
- every node is active in every query, so there is nothing to compress
- the query is subtree-only, so Euler Tour / Subtree Queries already fits
- the real bottleneck is online path aggregates, so Heavy-Light Decomposition is a better exact route
- the tree topology changes online, so this page is too static; compare Link-Cut Tree
Prerequisites¶
Helpful neighboring topics:
Problem Model And Notation¶
Root the tree once at some node r.
For each node u, we use:
tin[u]: DFS entry timetout[u]: DFS exit time in the rooted treedepth[u]lca(u, v)
For one query, let S be the set of marked vertices.
The virtual tree of S is the smallest rooted tree that:
- contains every vertex in
S - contains every LCA needed to preserve the ancestry relations among the marked paths
- keeps the same ancestor-descendant order as the original tree
The virtual-tree edges are compressed edges of the original tree:
- one edge
u -> vin the virtual tree meansuis the nearest kept ancestor ofv - the original path from
utovcontains no other kept vertex
That is why one compressed edge often stores useful metadata such as:
depth[v] - depth[u]- distance in the original tree
- minimum edge or vertex cost on the represented path
From Brute Force To The Right Idea¶
Brute Force¶
Suppose one query marks k important vertices.
The naive instinct is often:
- walk every marked-to-marked path directly
- mark all vertices on the whole union of those paths
- then run a DP on that induced subgraph
That can be far too large across many queries.
The original tree might have n = 2e5 vertices even when:
this query only really cares about 12 marked nodes.
The First Right Observation¶
Most vertices on those connecting paths are boring.
They only matter if they are:
- originally marked
- or a branch point where marked paths meet
In a rooted tree, those branch points are exactly the relevant LCAs.
The Second Right Observation¶
You do not need LCAs of all pairs.
A standard fact is:
after sorting marked nodes by DFS preorder,
it is enough to add LCAs of consecutive nodes in that order.
That is the key size bound.
The Third Right Observation¶
Once the final kept vertices are sorted by tin, the compressed parent relation can be built by one stack:
- keep the current ancestor chain
- pop until the top is an ancestor of the next node
- connect that top to the next node
That turns the query from "huge tree again" into:
sort -> add needed LCAs -> sort/unique -> stack-build one tiny rooted tree
Core Invariants And Why They Work¶
1. Consecutive-LCA Invariant¶
Let the marked vertices be sorted by DFS preorder.
Then every LCA needed for the virtual tree appears among LCAs of consecutive vertices in that sorted order.
So you do not need:
- all
O(k^2)pair LCAs
You only need:
k - 1consecutive LCAs
This is the size-saving invariant that makes the whole technique practical.
2. Ancestor Order Is Preserved¶
The virtual tree does not invent new ancestry.
If u is an ancestor of v in the virtual tree, then u is also an ancestor of v in the original rooted tree.
That is why:
tin/toutancestor checks still work- rooted-tree DP logic can be reused on the compressed tree
3. The Virtual Tree Has Only O(k) Vertices¶
If the query marks k vertices, then after:
- adding consecutive LCAs
- sorting
- deduplicating
the number of kept vertices is still O(k).
So a linear DP on the virtual tree is really a linear DP in the query size, not in n.
4. Every Compressed Edge Represents One Clean Original Path Segment¶
If u is the virtual parent of v, then the original path from u to v contains no other kept vertex.
This means you can attach one summary to that edge, such as:
- its length
- whether there is at least one removable internal vertex
- the minimum cost along that path segment
Many query-specific DPs live entirely on that compressed metadata.
5. Stack Build Works Because DFS Order Linearizes Ancestors¶
After sorting the kept vertices by tin, ancestors always appear before descendants.
So when processing vertices in that order:
- the active stack is one ancestor chain
- the nearest kept ancestor of the next vertex is found by popping until ancestry holds again
That is why the build step is almost always a monotone-stack routine.
Exact First Route In This Repo¶
The repo's first exact route for this family is intentionally narrow:
- preprocess one rooted tree for
tin/tout,depth, and LCA - for one query set
S, build the virtual tree - expose:
- kept nodes
- parent relation in the compressed tree
- children lists
depth[parent -> child]difference in the original tree
The starter is:
The flagship in-repo rep is:
That rep is deliberately not "just build the tree".
It forces the full workflow:
- mark important nodes for one query
- build the virtual tree
- run a small DP on the compressed tree instead of the original one
Worked Mental Model¶
Suppose the marked set is:
S = {a, b, c}
and:
lca(a, b) = xlca(b, c) = y
Then the query does not care about every vertex on every path individually.
It only cares about:
- the marked endpoints
- the LCAs that merge those paths
- and some metadata on the compressed edges between them
That is the main habit to build:
first compress the tree-shaped query,
then solve the real combinatorics on the compressed object.
Variant Chooser¶
Use Plain LCA When¶
- you only need ancestor checks, distances, or one meeting point
- there is no marked-subset compression step
Use Virtual Tree When¶
- each query marks a relatively small subset of nodes
- only the union of marked paths matters
- a DP or counting pass on
O(k)vertices is the real target
Use Tree DP When¶
- the whole tree participates once
- there is no per-query marked subset to compress
Use HLD When¶
- the real operation is repeated path aggregation on the original static tree
- not "compress marked subset, then do one query-local DP"
Use DSU On Tree When¶
- each node needs one subtree answer built from merged child containers
- not one query-local compressed tree on arbitrary marked vertices
Common Pitfalls¶
- forgetting to sort by DFS preorder before adding consecutive LCAs
- adding LCAs of all pairs and destroying the
O(k)size bound - confusing the virtual tree with the original rooted tree
- not deduplicating after adding LCAs
- forgetting that compressed edges often need original-path metadata, not just parent indices
- using recursive DFS on the original tree blindly when the depth can be large and the repo prefers iterative preprocessing where practical
Practice Archetypes¶
The strongest virtual-tree archetypes are:
- marked-subset DP
- pairwise-separation or guarding problems
- sum/count over unions of marked paths
- query-local Steiner-tree style reductions
- query-local reroot or merge logic that would be too expensive on the full tree
The strongest smell is:
many queries, each with a small set of special nodes on one static tree
Reopen Paths¶
- Topic base -> LCA
- Whole-tree child-merge DP -> Tree DP
- Static path aggregates -> Heavy-Light Decomposition
- Subtree-only route -> Euler Tour / Subtree Queries
- Retrieval sheet -> Virtual Tree hot sheet
- Snippet chooser -> Template Library