Wavelet Tree¶

Wavelet Tree is the static range-query lane for questions like:

k-th smallest in a[l:r)
how many values in a[l:r) are <= x
how many values in a[l:r) are exactly x

The key promise is:

the array never changes
but the query depends on both
- position interval [l, r)
- value order inside that interval

That is where prefix sums, sparse table, and ordinary segment tree summaries start feeling awkward.

A Wavelet Tree answers these value-sensitive range queries by storing, at each value-partition node, a prefix map that tells us how the current subsequence splits into the left and right children.

At A Glance¶

Use when: one static array gets many subarray order-statistics or value-count queries
Core worldview: every node represents one value interval and one subsequence of the array, preserving original order inside that subsequence
Main tools: coordinate compression, value-range recursion, pref[i] = how many of the first i elements go left
Typical complexity: O(n log sigma) build, O(log sigma) query
Main trap: forgetting that as you descend, l and r stop meaning positions in the original array and become positions inside the current node subsequence

Strong contest signals:

"k-th smallest in subarray"
"count numbers <= x in subarray"
static array, many queries, no updates
persistent segment tree would work too, but the statement does not need versioning

Strong anti-cues:

updates happen between queries -> reopen Persistent Data Structures, Mo's Algorithm, or a segment-tree lane instead
the query is only static RMQ / min / gcd -> Sparse Table
one monotone sweep or right-endpoint offline pass already solves it -> Offline Tricks
the answer is not value-sensitive at all, only aggregate-sensitive

Prerequisites¶

Coordinate Compression
Segment Tree for the "recursive value partition" mental model
Persistent Data Structures as the best compare point for static range order statistics

Helpful neighbors:

Mo's Algorithm when the real invariant is "one current active interval" rather than value splitting
Offline Tricks when queries are reorderable enough that a lighter sweep wins

Problem Model And Notation¶

Let:

\[ a_0, a_1, \dots, a_{n-1} \]

be a static array.

We will use zero-based half-open interval notation:

\[ [l, r) \]

for the queried subarray.

The starter in this repo supports:

kth_smallest(l, r, k) with k zero-based
count_leq(l, r, x)
count_equal(l, r, x)

Internally:

coordinate-compress the values
build a binary recursion over the compressed value range
at each node, store a prefix-count vector:

pref[i] = among the first i elements of this node's subsequence,
          how many map to the left child

That one vector is the whole translation mechanism.

From Brute Force To The Right Idea¶

Brute Force¶

For one query k-th smallest on a[l:r), the obvious route is:

copy the subarray
sort it
read the answer

That costs:

\[ O((r-l)\log(r-l)) \]

per query, which is too slow for heavy query sets.

A Compare Point: Persistent Segment Tree¶

A persistent segment tree can answer the same order-statistics family by comparing two version roots:

prefix root up to r
prefix root up to l

That route is excellent when the teaching goal is:

versions
path copying
prefix-root subtraction

But if the array is just static and the job is only "query by position interval and value rank", the persistent route is not the lightest mental model.

The Structural Observation¶

Suppose we split the compressed value domain into:

\[ [lo, mid] \quad \text{and} \quad [mid+1, hi] \]

At one node, consider the subsequence of array elements whose values belong to [lo, hi].

For a range query [l, r) on that subsequence, the next question is:

how many of those queried elements belong to the left child?

If we know that count quickly, then:

k-th smallest knows whether to go left or right
count <= x knows whether to prune or recurse
count == x knows which unique value branch to follow

So the only real problem is:

fast translation of one current [l, r)
into child-local intervals

The Wavelet Tree Trick¶

Store, at every node:

pref[i] = count of left-child elements among the first i elements

Then:

left child interval becomes:

\[ [pref[l], pref[r)) \]

right child interval becomes:

\[ [l - pref[l],\ r - pref[r)) \]

That is why queries drop to O(log sigma): each level uses one O(1) translation.

Core Invariant And Why It Works¶

1. Node Meaning¶

Each node represents:

one compressed value interval [lo, hi]
one subsequence containing exactly the array elements whose values lie in that interval
original order preserved inside that subsequence

Preserving order is essential, because range queries are position-based.

2. Prefix-Left Meaning¶

At one node:

pref[i] = number of first i subsequence elements that go to the left child

So for any queried node-local interval [l, r):

pref[r] - pref[l] = how many queried elements continue into the left child
the rest continue into the right child

3. Why Range Translation Is Correct¶

The left child subsequence is exactly the stable filter:

keep only elements whose compressed value is in the left value interval

So among the first l elements of the current subsequence, exactly pref[l] survive into the left child.

That makes the left child range:

\[ [pref[l], pref[r)) \]

The right child range is what remains after removing those left-going elements.

4. Why `k-th smallest` Works¶

At a node, let:

\[ cnt_L = pref[r] - pref[l] \]

If k < cnt_L, then the answer lies in the left child.

Otherwise, the answer lies in the right child, and inside that child it becomes:

\[ k - cnt_L \]

So every level peels off exactly the left-side rank mass and continues.

5. Why `count_leq` Works¶

If the node value interval is fully above x, contribute 0.

If the node value interval is fully below or equal to x, contribute the whole range length:

\[ r - l \]

Otherwise recurse, using the same translated child ranges.

That is the same invariant as segment-tree pruning, except the dimension is:

value interval

while the query range remains:

position interval within the current subsequence

Worked Example¶

Take:

\[ a = [1, 5, 2, 6, 3, 7, 4] \]

and query:

Q(2, 5, 3)

meaning:

one-based inclusive interval [2, 5]
third smallest element

Convert to the starter convention:

interval [1, 5)
k = 2 zero-based

The queried subarray is:

\[ [5, 2, 6, 3] \]

whose sorted order is:

\[ [2, 3, 5, 6] \]

so the answer should be 5.

At the root, split the value ranks into:

left half = smaller values
right half = larger values

Inside [5, 2, 6, 3], exactly two values go left:

\[ [2, 3] \]

So:

cnt_L = 2
k = 2

Since k is not < 2, the answer is in the right child, and the new rank is:

\[ 2 - 2 = 0 \]

The right-child subsequence for this query is:

\[ [5, 6] \]

At the next split, only 5 goes left, so:

left count = 1
k = 0

Now k < 1, so descend left and reach the leaf for value 5.

That is the whole algorithm:

count how many queried elements go left
rewrite the interval
keep the correct child-local rank

Algorithm And Pseudocode¶

Build¶

build(node, values_in_this_subsequence, lo, hi):
    store node value interval [lo, hi]
    pref[0] = 0

    if lo == hi:
        fill pref as 0, 1, 2, ..., len
        return

    mid = (lo + hi) / 2
    left_values = []
    right_values = []

    for value in values_in_this_subsequence:
        if value <= mid:
            left_values.push(value)
            pref.push_back(pref.back() + 1)
        else:
            right_values.push(value)
            pref.push_back(pref.back())

    build(left child, left_values, lo, mid)
    build(right child, right_values, mid + 1, hi)

K-th Smallest¶

kth(node, l, r, k):
    if node is a leaf:
        return node value

    left_l = pref[l]
    left_r = pref[r]
    left_count = left_r - left_l

    if k < left_count:
        return kth(left child, left_l, left_r, k)

    right_l = l - left_l
    right_r = r - left_r
    return kth(right child, right_l, right_r, k - left_count)

Count `<= x`¶

count_leq(node, l, r, x_rank):
    if [node.lo, node.hi] is fully above x_rank:
        return 0
    if [node.lo, node.hi] is fully inside <= x_rank:
        return r - l

    translate [l, r) into children using pref
    recurse only where needed

Variant Chooser¶

Use The Repo Starter Exactly When¶

the array is static
queries are online or arbitrary-order, but updates never happen
you need order statistics or value-count queries on subarrays
coordinate compression is acceptable

This is the exact lane for:

MKTHNUM - K-th Number

Prefer Persistent Segment Tree When¶

the teaching goal is really prefix-root subtraction
you need the stronger compare point for versioned structures
the problem family naturally talks about snapshots or version roots

Persistent segment tree and wavelet tree often solve similar static order-statistics tasks. The question is:

do you want value partition with prefix-left maps
or version roots with path copying

Prefer Mo / Offline Tricks When¶

the answer is maintained by local add/remove of one active window
or one monotone sweep already kills the problem

Those lanes are about:

reordering queries

Wavelet tree is about:

prebuilding a value-sensitive query structure

Reopen The Topic Before Reusing The Starter When¶

updates appear
the task wants a wavelet matrix or compressed bitvector engineering for tighter memory/constants
the task becomes 2D or geometric
you need richer queries than the starter advertises

Implementation Notes¶

1. Compress Values First¶

The starter compresses values automatically.

That keeps the recursion depth tied to:

number of distinct values

rather than raw magnitude like 10^9.

2. Internal `[l, r)` Is Node-Local¶

This is the main source of bugs.

After one descent, [l, r) no longer indexes the original array. It indexes:

the subsequence stored at the current node

3. Decide `k` Convention Once¶

The starter uses:

zero-based k

Many judge statements use:

one-based k

Convert exactly once at the call site.

4. Static Means Static¶

This starter does not support:

point updates
swaps
dynamic insert / erase

If updates appear, reopen the lane choice.

5. Wavelet Tree vs Wavelet Matrix¶

This repo ships the classical recursive wavelet-tree starter because it is the cleanest first learning route.

If later you need:

better constants
packed bitvector engineering
stronger low-memory static queries

the next compare point is:

wavelet matrix

not "stretch this starter until it hurts."

Practice Archetypes¶

First Rep In This Repo¶

MKTHNUM - K-th Number

Best Compare Point In This Repo¶

Persistent Data Structures

Wavelet Tree¶

At A Glance¶

Prerequisites¶

Problem Model And Notation¶

From Brute Force To The Right Idea¶

Brute Force¶

A Compare Point: Persistent Segment Tree¶

The Structural Observation¶

The Wavelet Tree Trick¶

Core Invariant And Why It Works¶

1. Node Meaning¶

2. Prefix-Left Meaning¶

3. Why Range Translation Is Correct¶

4. Why `k-th smallest` Works¶

5. Why `count_leq` Works¶

Worked Example¶

Algorithm And Pseudocode¶

Build¶

K-th Smallest¶

Count `<= x`¶

Variant Chooser¶

Use The Repo Starter Exactly When¶

Prefer Persistent Segment Tree When¶

Prefer Mo / Offline Tricks When¶

Reopen The Topic Before Reusing The Starter When¶

Implementation Notes¶

1. Compress Values First¶

2. Internal `[l, r)` Is Node-Local¶

3. Decide `k` Convention Once¶

4. Static Means Static¶

5. Wavelet Tree vs Wavelet Matrix¶

Practice Archetypes¶

First Rep In This Repo¶

Best Compare Point In This Repo¶

Good External Follow-Ups¶

References¶

Repo Anchors¶

Wavelet Tree¶

At A Glance¶

Prerequisites¶

Problem Model And Notation¶

From Brute Force To The Right Idea¶

Brute Force¶

A Compare Point: Persistent Segment Tree¶

The Structural Observation¶

The Wavelet Tree Trick¶

Core Invariant And Why It Works¶

1. Node Meaning¶

2. Prefix-Left Meaning¶

3. Why Range Translation Is Correct¶

4. Why k-th smallest Works¶

5. Why count_leq Works¶

Worked Example¶

Algorithm And Pseudocode¶

Build¶

K-th Smallest¶

Count <= x¶

Variant Chooser¶

Use The Repo Starter Exactly When¶

Prefer Persistent Segment Tree When¶

Prefer Mo / Offline Tricks When¶

Reopen The Topic Before Reusing The Starter When¶

Implementation Notes¶

1. Compress Values First¶

2. Internal [l, r) Is Node-Local¶

3. Decide k Convention Once¶

4. Static Means Static¶

5. Wavelet Tree vs Wavelet Matrix¶

Practice Archetypes¶

First Rep In This Repo¶

Best Compare Point In This Repo¶

Good External Follow-Ups¶

References¶

Repo Anchors¶

4. Why `k-th smallest` Works¶

5. Why `count_leq` Works¶

Count `<= x`¶

2. Internal `[l, r)` Is Node-Local¶

3. Decide `k` Convention Once¶