Skip to content

XOR Basis / Linear Basis

This lane is for the moment when a xor problem stops being about:

  • one pair of numbers
  • one stored partner maximizing xor
  • one prefix xor trick

and becomes about:

  • the whole span of subset xors
  • linear independence over GF(2)
  • whether one value is representable by xor-ing some chosen elements

In contest terms, the key shift is:

  • stop thinking of the input as "many unrelated numbers"
  • treat each number as one binary vector
  • keep only a maximal linearly independent set
  • answer subset-xor questions from that compressed basis

This is the exact lane for the repo's first xor basis route.

At A Glance

  • Use when: the task is about xor of some subset of numbers, not xor against one chosen stored partner
  • Core worldview: the input numbers generate one vector space over GF(2), and the basis is a maximal linearly independent set of generators
  • Main tools: pivot bits, greedy insertion by highest set bit, representability test, greedy maximum-subset-xor query
  • Typical complexity: O(B) per insertion or query for bit width B
  • Main trap: confusing subset-xor basis queries with binary-trie queries against one stored element

Strong contest signals:

  • "choose any subset"
  • "maximum xor obtainable from the given numbers"
  • "can we xor some chosen elements to get x?"
  • "keep only independent xor generators"
  • "the statement smells like Gaussian elimination, but over bits"

Strong anti-cues:

  • the query is "best stored number for this current x" -> Binary Trie / XOR Queries
  • the task is one static range xor or one path xor with no cycle freedom -> prefix xor or tree xor may already solve it
  • the task is ordinary modular linear algebra rather than xor/bitwise structure

The repo's exact first route for this family is:

Prerequisites

Helpful neighboring topics:

Problem Model And Notation

Let the input numbers be:

\[ a_1, a_2, \dots, a_n \]

Interpret each number as a binary vector over GF(2).

The reachable values are:

\[ \left\{ \bigoplus_{i \in S} a_i \mid S \subseteq \{1,2,\dots,n\} \right\} \]

So the real object is not the list of numbers itself.

It is the xor-span generated by those numbers.

An xor basis stores a maximal set of linearly independent vectors from that span, usually organized by pivot bit:

  • one basis vector per highest set bit

With the repo starter convention:

  • basis[b] is the stored vector whose highest set bit is b, or 0 if none exists there

From Brute Force To The Right Idea

Brute Force

If the task asks for:

  • maximum xor over all subsets

then the direct route is:

  • try all 2^n subsets

That becomes impossible immediately.

Structural Observation

Different subsets can produce the same xor value.

So the right object is not:

  • all subsets

but:

  • the vector space generated by the numbers under xor

Inside that space, many numbers are redundant because they can already be expressed as xor of earlier ones.

The exact question becomes:

which numbers are truly independent generators of the xor span?

That is a basis question.

Why Greedy Highest-Bit Insertion Works

Take one number x.

Look at its highest set bit.

  • if no existing basis vector has that pivot, keep x as the new pivot vector there
  • otherwise xor it with the existing pivot vector and continue downward

This is just Gaussian elimination over GF(2), specialized to bitmasks.

Each xor step removes the current highest set bit, so the process strictly progresses.

Core Invariant And Why It Works

1. One Pivot Per Highest Set Bit

Maintain:

basis[b] = one vector whose highest set bit is b

for each bit position b.

Then no two stored vectors share the same highest set bit, which means:

  • the stored vectors are linearly independent

2. Insertion Preserves The Span

When inserting x, repeatedly xor away pivot bits that already exist.

Two cases remain:

  • x becomes 0
  • x ends with a highest set bit b not used yet

If x becomes 0, then:

  • the new number was already representable by the old basis

If x survives with new pivot b, then:

  • adding it enlarges the span by one dimension

So insertion never changes the generated xor-space incorrectly:

  • it either keeps the same span or extends it by one independent direction

3. Why Representability Testing Works

To test whether some value x is representable:

  • run the same elimination process against the stored basis

If it reduces to 0, then:

  • x belongs to the span

If not, then:

  • some pivot bit remains unmatched, so x is outside the span

4. Why Greedy Maximum-Xor Query Works

Suppose we want the maximum xor value obtainable from any subset.

Start with ans = 0 and scan pivot bits from high to low.

If:

ans xor basis[b] > ans

then take it.

This is correct because a higher bit dominates all lower bits.

Once we decide whether bit b can be made 1, no lower-bit choice can compensate for getting that decision wrong.

So the query is a greedy walk over pivot bits, exactly like the standard maximum-xor logic in Gaussian form.

Worked Examples

Example 1: Maximum Subset Xor

This repo's flagship note:

Set:

1, 2, 4

These are already independent pivot vectors, so the maximum subset xor is:

1 xor 2 xor 4 = 7

This is the cleanest first benchmark because:

  • the only goal is maximum subset xor
  • the whole payoff comes directly from building the basis correctly

Example 2: Why Binary Trie Is A Different Family

Suppose the query is:

  • for current x, choose one stored number y maximizing x xor y

That is not an xor-basis question.

That is the lane of Binary Trie / XOR Queries, because the task is about one stored partner, not any subset.

The family split is:

  • binary trie -> maximize against one stored element
  • xor basis -> maximize over xor of any subset

Example 3: Graph Cycle-Space Stretch

In xor-path problems like:

  • find the minimum xor path from 1 to n

one DFS/tree xor gives one candidate path value, and every extra cycle contributes one xor vector.

Then the cycle-xor set is reduced to an xor basis, and that basis is used to improve the path value greedily.

That is a strong stretch use, but it is not the repo's exact first starter route.

Variant Chooser

Use XOR Basis When

  • the task is about xor of some subset
  • maximum subset xor or representability is the real query
  • a compact independent generator set is enough

That is the lane of this page.

Use Binary Trie When

  • the query is against one current external value x
  • you need one stored element maximizing xor with it
  • insert / erase / query happen online in a live multiset

Use Prefix-Xor Tricks When

  • the structure is about subarrays or tree paths with fixed endpoints and no extra subset freedom
  • one xor-difference identity already captures the whole task

Complexity And Tradeoffs

For fixed bit width B + 1:

  • insert: O(B)
  • representability test: O(B)
  • maximum subset xor: O(B)
  • memory: O(B)

Tradeoffs:

  • excellent when the whole span matters
  • much lighter than storing all numbers
  • not the right tool for one-partner online xor queries
  • advanced variants like kth-xor, merges, or rollback need extra structure and are outside the exact starter route here

Starter Fit

The starter in this repo is intentionally narrow:

  • long long xor basis
  • insert(x)
  • can_make(x)
  • max_xor()
  • dimension()

That exact route is enough for:

  • maximum subset xor
  • representability queries
  • many first linear-basis problems

If the task needs more, extend carefully only after the basic row-echelon worldview is stable:

  • reduced basis for kth-xor style queries
  • merging bases
  • graph cycle-basis reductions

Common Pitfalls

  • mixing up subset-xor with pair-xor
  • forgetting that duplicate or dependent numbers may add no new dimension
  • scanning bits in the wrong direction during greedy max query
  • using signed-bit assumptions carelessly when inputs are large
  • overclaiming that the starter already supports kth-xor or rollback/segment-tree merges

Repo Routes

References