Skip to content

Distinct Values Queries

  • Title: Distinct Values Queries
  • Judge / source: CSES
  • Original URL: https://cses.fi/problemset/task/1734
  • Secondary topics: Offline sweep by right endpoint, Fenwick tree, Last occurrence
  • Difficulty: hard
  • Subtype: Count how many distinct values appear in many static subarrays
  • Status: solved
  • Solution file: distinctvaluesqueries.cpp

Why Practice This

This problem is great offline-trick training because the online-looking question:

how many distinct values are inside [l, r]?

becomes much simpler once we notice that all queries are known in advance.

Then we can sort queries by r and sweep once from left to right.

Recognition Cue

Reach for an offline right-endpoint sweep when:

  • all range queries are known up front
  • the answer for [l, r] becomes easy if the sweep is already at r
  • one monotone pass can keep enough state to answer every query ending here
  • trying to support the same query online feels heavier than the difficulty suggests

For this problem, the key structural smell is:

  • "static array, many range queries, distinct-count statistic"

That is a strong hint to sort queries and maintain last occurrences instead of answering online in input order.

Problem-Specific Transformation

The statement asks for the number of distinct values in many subarrays. The reusable rewrite is:

  • while sweeping left to right, keep exactly one active occurrence per value
  • specifically, keep only the latest occurrence seen so far

Then by the time we reach a query's right endpoint r, the answer on [l, r] is just:

  • how many active marks lie in that range

So the original distinct-values question becomes:

  • point updates when a latest occurrence changes
  • range-sum queries over active marks

That is why Fenwick + offline sweep is the clean formulation.

Core Idea

Process the array from left to right. At position i, imagine marking exactly one active occurrence for each value:

  • the most recent occurrence seen so far gets mark 1
  • every older occurrence of the same value gets mark 0

If we maintain that invariant, then for any query [l, r] answered when the sweep is at r, the number of distinct values inside the range is just:

sum(l, r)

over those active marks.

How do we maintain it?

When value a[i] appears:

  1. if it appeared before at last[a[i]], remove that old active mark
  2. add an active mark at i
  3. update last[a[i]] = i

A Fenwick tree stores these active marks and supports:

  • point updates
  • range-sum queries

Queries are sorted by right endpoint. As soon as the sweep reaches a query’s r, all information that should affect that query is already present in the Fenwick tree.

Complexity

  • sorting queries: O(q log q)
  • each array position causes at most two Fenwick updates: O(n log n)
  • each query is one Fenwick range sum: O(q log n)
  • total: O((n + q) log n)

Pitfalls / Judge Notes

  • Offline order is part of the algorithm. Always keep the original query index so answers can be restored.
  • The Fenwick tree stores “is this the latest occurrence so far?”, not raw frequencies.
  • Remove the previous active occurrence before activating the new one.
  • Mo’s algorithm also solves this problem, but the right-endpoint sweep is cleaner for a first offline-tricks note.

Reusable Pattern

Solutions