PBDS / Order Statistics Tree¶

This lane is for the moment when an ordinary ordered set is almost enough, but the statement keeps asking for:

the k-th smallest alive element
the number of alive elements < x
rank queries after updates
one dynamic ordered set where brute-force indexing is too slow

The repo's exact first route for this family is:

starter -> pbds-ordered-set.cpp
flagship note -> ORDERSET - Order Statistic Set
compare point -> Heaps And Ordered Sets
compare point -> Fenwick Tree

This lane is intentionally about the contest-useful GNU PBDS route, not about re-teaching red-black trees from scratch.

At A Glance¶

Use when: one dynamic ordered set needs online rank or k-th queries
Core worldview: keep the set sorted and maintain subtree sizes so rank and k-th become logarithmic tree descents
Main tools: order_of_key, find_by_order, and a pair-key wrapper when duplicates matter
Typical complexity: O(log n) per insert / erase / rank / k-th operation
Main trap: forgetting that raw tree<T, ...> is a set, so duplicates need a wrapper such as (value, unique_id)

Strong contest signals:

"print the k-th remaining element"
"how many active values are strictly smaller than x?"
"maintain salaries / ratings / scores under updates and answer rank-count queries"
"ordered set is right, but predecessor alone is not enough"

Strong anti-cues:

you only need predecessor / successor / erase-one -> plain Heaps And Ordered Sets
the value universe is compressible and only prefix counts matter -> Fenwick Tree
the real task is static subarray order statistics -> Wavelet Tree
the structure is really a mutable sequence by position rather than a value-ordered set -> Treap / Implicit Treap

Prerequisites¶

C++ Language For Contests at the level of reading one GNU-only extension without fear
Heaps And Ordered Sets because you should already know when ordinary set / multiset is enough
Fenwick Tree as the best compare route when compression turns the same rank-count story into prefix sums

Helpful compare points:

Heaps And Ordered Sets: use this first when neighbor queries are enough
Fenwick Tree: use this when counts live on one compressed coordinate axis and prefix sums already solve the query
Treap / Implicit Treap: use this when split/merge or sequence surgery is the real need, not just order statistics
Wavelet Tree: use this when the array is static and the query is inside one subarray, not one evolving ordered set
Balanced BSTs For Contests: use this if you deliberately want the AVL / Red-Black / Scapegoat / SBT compare note and the broader textbook balancing map

Problem Model And Notation¶

Let S be one dynamic ordered set or multiset.

The two canonical order-statistics queries are:

\[ \operatorname{rank}(x) = |\{y \in S : y < x\}| \]

and

\[ \operatorname{kth}(k) \]

for the element with zero-based order k in sorted order.

The ordinary online operations are:

insert(x)
erase(x) or erase_one(x)
count_less(x)
count_leq(x)
kth(k)

The exact PBDS primitive names are:

order_of_key(x) -> number of keys strictly smaller than x
find_by_order(k) -> iterator to the zero-based k-th key

That is the whole contest contract.

From Brute Force To The Right Idea¶

Brute Force¶

Keep all active values in a vector and:

insert / erase in O(n)
sort when needed
answer k-th or rank queries by indexing after re-sort

This fails as soon as updates and queries interleave many times.

The First Shift¶

The statement is not about one array. It is about one changing sorted set.

If the real questions are:

how many active values are < x?
what is the k-th active value right now?

then the sorted order itself is the state that matters.

The Second Shift¶

An ordinary set already keeps sorted order, but it does not expose rank or k-th directly.

PBDS adds exactly that extra layer by attaching an order-statistics policy to the tree.

So the contest pivot is:

ordered set + subtree-size aware rank queries

not:

invent a new balanced BST from scratch

Core Invariant And Why It Works¶

The structural invariant is:

the tree stores all live keys in sorted order

The order-statistics invariant is:

every node knows the size of its subtree

That is enough for both core queries.

1. Why `order_of_key(x)` Works¶

To count how many keys are < x, the tree descends by comparisons:

if the current key is < x, then its whole left subtree also counts
otherwise only the left branch can still contribute

Subtree sizes make those whole-branch contributions O(1) at each step.

2. Why `find_by_order(k)` Works¶

If the left subtree has size L:

k < L -> go left
k = L -> current node is the answer
k > L -> go right with k -= L + 1

Again, the only extra ingredient is subtree size.

3. Why Duplicates Need A Wrapper¶

Raw PBDS ordered trees are sets, not multisets.

So if duplicates matter, the correct contest trick is:

\[ (value, unique\_id) \]

ordered lexicographically.

That turns equal values into distinct keys while preserving value order.

Then:

count_less(x) becomes order_of_key((x, -INF))
count_leq(x) becomes order_of_key((x, +INF))
erase_one(x) means erase the first pair whose first component is x

That is the exact bridge from pure set semantics to duplicate-safe multiset semantics.

Worked Examples¶

Example 1: ORDERSET¶

This repo's flagship note:

ORDERSET - Order Statistic Set

The whole problem is exactly the family:

insert if absent
delete if present
print the k-th smallest
count how many values are < x

That is the cleanest first benchmark because no extra modeling hides the PBDS lane.

Example 2: Salary Queries¶

Suppose salaries update online and each query asks:

\[ \#\{salary \in [a, b]\} \]

This is still one order-statistics problem.

With the duplicate-safe pair wrapper:

erase the old salary of employee i
insert the new salary
answer with:

\[ \operatorname{count\_leq}(b) - \operatorname{count\_less}(a) \]

This is the exact bridge from unique ordered sets to duplicate-safe ordered multisets.

Example 3: Josephus Problem II¶

Maintain the alive positions:

\[ \{1, 2, \dots, n\} \]

At each step:

compute the next removal index modulo current size
remove *find_by_order(idx)

This is the cleanest first find_by_order-heavy benchmark.

So the same family supports:

rank by value
k-th by sorted order
cyclic elimination on alive indices

Complexity And Tradeoffs¶

For the first-order PBDS lane:

insert: O(log n)
erase: O(log n)
order_of_key: O(log n)
find_by_order: O(log n)
memory: O(n)

Tradeoffs:

very clean for contest problems that truly want rank or k-th online
GNU-specific, so it is not portable ISO C++
not the best first choice when a compressed Fenwick tree already gives simpler prefix-count logic
not a replacement for split/merge BST families such as treap

Starter Fit¶

The starter in this repo is intentionally practical rather than maximal.

It exposes:

ordered_set<T> for the raw unique-key PBDS route
OrderedMultiset<T> for duplicate-safe contest use

The duplicate-safe wrapper gives:

insert(x)
erase_one(x)
count_less(x)
count_leq(x)
count_in_range(l, r)
kth_value(k, out)

It does not try to teach:

custom node updates
portable self-balanced BST implementations
split/merge BST design

If you need those, leave this lane and reopen:

Treap / Implicit Treap
Link-Cut Tree if the ordered set is only preparatory machinery and the real problem has already become dynamic-tree path logic

Retrieval Layer¶

quick sheet -> Order Statistics Tree hot sheet
exact starter -> pbds-ordered-set.cpp
flagship note -> ORDERSET - Order Statistic Set
compare route -> Heaps And Ordered Sets
compare route -> Fenwick Tree
broader chooser -> Data structures cheatsheet

Exit Criteria¶

You are ready to move on when:

you can explain the difference between predecessor queries and order-statistics queries
you know when order_of_key beats lower_bound-style reasoning
you remember that duplicates need the pair-key wrapper
you can recognize when compressed prefix counts are simpler than PBDS

References And Repo Anchors¶

Research sweep refreshed on 2026-04-25.

Primary / reference:

Practice:

Repo anchors:

starter template: pbds-ordered-set.cpp
ladder: PBDS / Order Statistics Tree ladder
notebook refresher: Order Statistics Tree hot sheet
flagship note: ORDERSET - Order Statistic Set