Bit-Parallelism / Bitset Optimization¶
This family begins when a DP or reachability problem still has the right high-level model,
boolean states on one large axis
but the ordinary implementation
for every item:
for every state:
is too slow by a large constant or by a whole factor of machine word width.
In this repo, the first exact route is intentionally narrow:
- one boolean reachability DP
- one packed bitset state
- one update of the form:
That is the first contest-ready doorway into bit-parallelism here.
This page does not try to teach every use of bitsets in algorithms.
It starts with the most reusable practical route:
- word-packed subset-sum / feasibility DP on one sum axis
and only then points toward:
- string bitset speedups
- packed transitive-closure style DP
- broader word-parallel boolean-state tricks
At A Glance¶
- Use when: the state is boolean or near-boolean, one axis is large, and the update can be phrased as word shifts / ORs instead of scalar transitions
- Core worldview: pack many boolean DP states into machine words and update many states at once
- Main tools: dynamic bitset, shift-OR reachability, word trimming, component-size compression
- Typical complexity: about
O(n * max_sum / word_bits)instead ofO(n * max_sum) - Main trap: using bitset packing for problems that really need max-value or counting DP, not pure feasibility / reachability
Strong contest signals:
- the DP state is boolean: reachable or not
- the transition is:
- "add one weight"
- "shift all currently reachable sums by
w" - "OR the result back"
max_sumis too large for a comfortable scalarO(nW)implementation, but still small enough to store one packed bitset- the bottleneck is plain state updates, not modeling
Strong anti-cues:
- the DP stores best value, min cost, or number of ways instead of a boolean state
Wis already tiny enough that ordinaryO(nW)is simpler- the state space is the full boolean cube and the real need is a transform family -> FWHT / XOR Convolution / Subset Convolution
- the state is a subset mask of items, not a sum axis or packed boolean frontier -> Bitmask DP
Prerequisites¶
Helpful nearby anchors:
- DP cheatsheet
- FWHT / XOR Convolution / Subset Convolution, when the packed bitset idea is not enough and the real lane is transform algebra
Problem Model And Notation¶
Assume we want the set of reachable sums:
after processing some multiset of nonnegative weights.
The ordinary boolean DP is:
For one 0/1 weight w, the scalar update is:
The packed view stores the whole boolean row as one bitset:
where bit x is set exactly when sum x is reachable.
Then the same update becomes:
That is the central invariant of the first route.
From Brute Force To The Right Idea¶
Scalar Boolean DP¶
For classic subset-sum reachability, the direct transition is:
for each weight w:
for sum from W down to w:
reachable[sum] |= reachable[sum - w]
This is already correct and often good enough.
But when:
Wis large- the state is only boolean
- and the transition is just one shift plus OR
then scalar updates are wasting machine-word parallelism.
The Packing Observation¶
Instead of storing:
reachable[0], reachable[1], ..., reachable[W]
as separate bytes or integers, pack them into 64-bit words.
Then one shifted OR updates roughly 64 states at once.
So the question changes from:
what is the recurrence?
to:
can the same recurrence be executed wordwise?
For boolean subset sum, the answer is yes.
Core Invariant And Why It Works¶
Let bit x of reachable mean:
Then shifting left by w produces a new bitset whose set bits are exactly:
So after:
bit y is set iff:
ywas already reachable, ory - wwas reachable before this item
which is exactly the 0/1 subset-sum transition.
The implementation detail is that a dynamic bitset shift must:
- move whole machine words
- then move the remaining bit offset
- and trim away bits beyond the allowed maximum index
That is the first exact route taught by:
Why The Flagship Problem Fits¶
The flagship rep in this repo is:
That problem is not just raw subset sum.
It first compresses the graph into connected-component sizes, then asks which group sizes are reachable.
So it is a perfect first practical route for the family:
- do one light modeling step
- reduce to boolean subset-sum reachability
- solve by packed shift-OR DP
Common Traps¶
1. Using This For Max-Value DP¶
Bitset packing is ideal when the state is boolean.
If the DP stores:
- maximum value
- minimum cost
- number of ways
then this starter is the wrong tool.
2. Reusing One Item More Than Once By Mistake¶
The route here is for 0/1-style reachability.
The packed shift:
is correct because the whole right-hand side conceptually uses the pre-update row.
If you change the model, re-check that contract carefully.
3. Forgetting To Trim High Bits¶
If the maximum useful sum is W, bits above W are garbage.
Dynamic bitset implementations must zero those tail bits after updates or before extraction.
4. Overclaiming The Family¶
This lane is broader than one subset-sum trick, but the first route is still narrow.
Do not jump straight from this starter to:
- weighted maximization DP
- full boolean-cube transforms
- every string bitset trick
without checking whether the state contract is still the same.
Use This Lane When¶
- reachability or feasibility is the real output
- one axis is large enough that word packing matters
- the update is mostly shifts, ORs, ANDs, or bitwise filters
Do Not Use This Lane When¶
- the value stored per state is not boolean
- the intended route is still ordinary knapsack and constants are irrelevant
- the task is really a transform problem on the full mask cube
References And Repo Anchors¶
Research sweep refreshed on 2026-04-25.
Books:
Official benchmarks:
Repo anchors:
- practice ladder: Bit-Parallelism / Bitset Optimization ladder
- flagship note: School Excursion
- starter template: bitset-reachability-shift.cpp
- notebook refresher: Bit-Parallelism / Bitset Optimization hot sheet