FWHT / XOR Convolution / Subset Convolution¶
This family begins when a problem already lives on the full boolean cube
mask in [0, 2^bits)
and the bottleneck is no longer:
aggregate one array over all submasks / supersets
but instead:
combine two mask-indexed functions under xor or under an exact subset split
This repo teaches that family in one intentionally narrow first route:
- exact starter =
xor convolutionvia theWalsh-Hadamard transform subset convolutionstays inside the same family, but as the next stretch once SOS DP already feels stable
So this page is not "every transform on the boolean cube."
It is the first contest-ready doorway from:
into:
- xor convolution
- Walsh-Hadamard style diagonalization
- subset convolution by popcount layering
At A Glance¶
- Use when: two full mask arrays must be pair-combined under
xor, or one subset-split convolution is the real bottleneck - Core worldview: this is
convolution on the boolean cube, not ordinary prefix sweeps and not additive-index FFT - Main tools: FWHT butterfly, inverse normalization by
1 / 2^bits, popcount layers for subset convolution - Typical complexity:
O(bits * 2^bits)for xor convolution,O(bits^2 * 2^bits)for subset convolution - Main trap: confusing
xor convolution,subset zeta sweeps, and ordinaryFFT/NTTconvolution because they all look like "fast transforms"
Strong contest signals:
- the state space is already one full array of size
2^bits - the combine rule is:
i xor j = k, orT subseteq Swith the complement piece forced to beS \\ T- the modulus is one friendly prime and exact arithmetic matters
- the brute force is one
O(4^bits)pair loop over masks
Strong anti-cues:
- each mask only needs one aggregate over all submasks or supersets -> SOS DP
- the indexing rule is additive like
i + j = k-> FFT / NTT - the mask is just an evolving DP state instead of a whole function on the cube -> Bitmask DP
- the bit width is too large to even store arrays of length
2^bits
Prerequisites¶
Helpful nearby anchors:
Problem Model And Notation¶
Let:
and let:
be functions on the full boolean cube.
XOR Convolution¶
The xor convolution is:
This is the first exact route of this lane.
Subset Convolution¶
The subset convolution is:
This is a sibling in the same family, but not the first starter snippet.
The reason both belong together is structural:
- both live on the full boolean cube
- both combine two mask-indexed functions
- both need a transform / layering trick beyond plain SOS zeta sweeps
From Brute Force To The Right Idea¶
XOR Convolution Brute Force¶
If N = 2^bits, the direct implementation is:
for k in [0, N):
for i in [0, N):
h[k] += f[i] * g[i xor k]
That is:
which dies quickly once bits reaches the low twenties.
Why SOS DP Is Not Enough¶
SOS DP speeds up:
- all-submask aggregation
- all-supermask aggregation
But xor convolution is not asking for one side of the cube to fold into another.
It is asking for a pair-combine rule:
combine by xor of indices
So the right move is not another zeta sweep on one array. The right move is to find a basis where xor convolution becomes pointwise multiplication.
Why FFT / NTT Is Also The Wrong Family¶
FFT / NTT handles:
which is additive indexing.
Here the relation is:
That is a different group law, so we need a different transform.
Core Invariant For FWHT¶
The Walsh-Hadamard butterfly for one pair (x, y) is:
Apply that bit by bit across the whole cube.
After processing the first t bits:
- every transformed value already represents a signed combination over the
2^tmasks that only differ inside those processed dimensions - the signs are determined by parity on the processed bits
That is the boolean-cube analogue of "diagonalize the convolution operator."
Once both arrays are transformed:
- xor convolution turns into pointwise multiplication
Then the inverse transform is:
- the same butterfly shape again
- followed by multiplying every value by:
under the chosen modulus
That is the exact starter route in:
Where Subset Convolution Fits¶
Subset convolution looks close to SOS DP, but it is not just:
zeta(f) * zeta(g)
because each term must preserve the exact split:
with disjoint parts and exact total popcount.
The standard route is:
- split values by popcount layer
- run subset zeta transforms on every layer
- multiply compatible layers
- inverse-zeta back
That gives:
and is the right stretch after xor convolution feels routine.
So the family progression in this repo is:
- SOS DP: one-array subset/superset sweeps
FWHT / XOR Convolution: pair-combine by xorSubset Convolution: pair-combine by exact subset split plus popcount layers
Common Traps¶
1. Forgetting The Inverse Normalization¶
For xor convolution, the inverse transform is not just "run the same loops backward."
You must also divide every entry by:
Under a modulus, that means multiply by inv(N).
2. Using The Wrong Transform Family¶
If the indexing law is:
Mixing these up is the most common modeling failure.
3. Treating Subset Convolution As Plain SOS DP¶
Plain zeta sweeps do not preserve the exact size split between T and S \setminus T.
That is exactly why subset convolution needs the extra popcount-layer dimension.
4. Forgetting The Power-Of-Two Contract¶
The starter route assumes one full cube of size 2^bits.
If your array size is not a power of two, your model is probably not the full boolean cube yet.
Use This Lane When¶
- the full cube array already exists naturally
- the statement wants every xor-combined answer at once
- or two subset-indexed functions must combine into one third function on the same cube
Do Not Use This Lane When¶
- one transformed array is enough and only subset/superset aggregation is needed
- the real bottleneck is item-order DP, not transform algebra
- the statement is secretly one additive convolution problem
References And Repo Anchors¶
Research sweep refreshed on 2026-04-25.
Books:
Official benchmarks:
Repo anchors:
- practice ladder: FWHT / XOR Convolution / Subset Convolution ladder
- flagship note: Bitwise XOR Convolution
- starter template: fwht-xor-convolution.cpp
- notebook refresher: FWHT / XOR Convolution / Subset Convolution hot sheet