Bitmask DP¶
Bitmask DP is the standard exact technique when the real state is:
- which items are already used
- which vertices are already visited
- which colors / groups / jobs are already covered
and the number of such items is small enough that
states are still feasible.
This is one of the most important contest patterns for "small n, huge search tree" problems. The brute force is usually n! or "all partitions / all permutations", but once you realize that many histories share the same remaining future, those histories collapse into the same subset state.
At A Glance¶
- Use when:
nis small, typically around15to22, and the state naturally is a subset - Most common states:
dp[mask],dp[mask][last] - Canonical examples: assignment DP, TSP / Hamiltonian DP, partition/subset-cover DP
- Typical complexity:
O(2^n),O(n 2^n), orO(n^2 2^n) - Main trap: storing more history than the future actually needs
- Do not use when:
nis large enough that2^nis already dead on arrival, or the state really wants a graph/BFS/modeling view instead
Problem Model And Notation¶
Assume the items are indexed
A mask is an integer whose binary expansion says which items are currently in the chosen set:
- bit
iis1if itemiis present - bit
iis0otherwise
So:
The full set is usually written as
Common operations:
The three most common meanings of a mask are:
- used / assigned items
- visited vertices
- already covered categories / colors / constraints
The whole topic is about choosing the smallest state that still tells the future everything it needs.
From Brute Force To The Right Idea¶
Brute Force: Enumerate All Orders Or All Assignments¶
A typical subset-style problem begins with a search space like:
- all permutations of
nitems - all ways to assign jobs to workers
- all ways to visit all vertices
- all ways to split items into groups
That often means:
or some comparably explosive search tree.
For example, in TSP:
- brute force tries every visiting order
- complexity is factorial
That is far too large even for n = 15.
Why Many Histories Are Equivalent¶
Suppose two partial solutions:
- have visited the same subset of cities
- and currently end at the same city
Then for every future continuation, the only thing that matters is:
- which cities are already done
- where you are now
The exact internal order that got you there is no longer relevant, except through the DP value you already computed.
That is the dynamic-programming collapse:
- many exponential histories
- one compact state
The Two Canonical State Shapes¶
dp[mask]¶
Use this when the subset alone already determines what happens next.
Typical signal:
- the next step depends only on which items are already used
- not on the order they were chosen
- and not on a current endpoint or position
Examples:
- assignment by "how many jobs are already taken"
- set-cover style DP
- some partition DPs
dp[mask][last]¶
Use this when the future depends on:
- the chosen subset
- plus one distinguished last / current item
Typical signal:
- path or ordering structure matters
- transition cost depends on where you currently stand
Examples:
- TSP / Hamiltonian path
- shortest path visiting all special nodes
- sequence-building with adjacency costs
This is the most important fork to recognize quickly.
Core Invariant And Why It Works¶
The DP Principle¶
Bitmask DP works only if the state remembers exactly what the future needs.
That means:
- if two histories produce the same state
- then every legal future continuation from those histories has the same set of options and the same future scoring rule
When that is true, keeping only the best value for each state is safe.
Why dp[mask][last] Is Enough For TSP-Style Problems¶
Let:
be the minimum cost of visiting exactly the vertices in mask and ending at vertex v.
Why is this enough?
Because once you know:
- which vertices remain unvisited
- and the current endpoint
v
every future extension cost depends only on those two pieces of information.
The earlier internal route matters only through its optimal total cost, which dp[mask][v] already stores.
That gives the standard recurrence:
This is the classical Bellman / Held–Karp viewpoint.
Why dp[mask] Is Enough For Assignment DP¶
In assignment DP, suppose:
- there are
nworkers - there are
njobs - the first
k = popcount(mask)workers have already been assigned masksays which jobs are taken
Then the next worker index is determined automatically by k.
So you do not need to store it as a separate dimension:
- the mask already implies where you are in the process
That is the signal that dp[mask] is enough.
State Minimality Matters¶
The most common conceptual bug is:
- storing extra data "just in case"
For example, if mask already determines the step number, then storing both:
maskused_count
is redundant.
Redundant state often turns:
into:
for no gain.
Complexity And Feasibility¶
This topic lives and dies by constraint reading.
Common Sizes¶
2^20 ≈ 10^620 * 2^20 ≈ 2 * 10^720^2 * 2^20 ≈ 4 * 10^8
These rough numbers are the mental calculator you need in contest.
Typical Regimes¶
O(2^n)is comfortable forn <= 22or soO(n 2^n)is often comfortable forn <= 20O(n^2 2^n)is usually forn <= 18..20, depending on constants and number of test cases
So the strongest signal is not just "subset exists." It is:
- subset exists
- and the constraints scream that only exponential-in-
nwith smallncan fit
Time-Memory Tradeoff¶
A table of size
can already be large.
That means you should think early about:
- whether to store
int,long long, orbool - whether rolling arrays are possible
- whether the state graph can be traversed in another way
Variant Chooser¶
Choose dp[mask] When¶
- the subset alone determines the progress
- the next position is implicit from
popcount(mask) - or the answer is "best way to cover / assign exactly this set"
Canonical examples:
- assignment DP
- minimum cost to cover colors
- some partition DPs over submasks
Choose dp[mask][last] When¶
- there is a current endpoint or current item
- transition cost depends on where you are
- order matters, but only through the last position
Canonical examples:
- TSP
- Hamiltonian path counting / optimization
- visit-all-special-nodes path problems
Think About BFS-On-Subsets Instead When¶
- the objective is minimum number of steps / moves
- each transition has uniform cost
- and the state graph is naturally unweighted
Then the right tool may be:
- BFS over
(mask, node)
instead of DP.
Think About Meet-In-The-Middle Instead When¶
- the subset state is the real structure
- but
2^nfor the fullnis too large
This often happens around:
n ≈ 35..45
Then splitting into two halves may be much better than a full subset DP.
Think About SOS DP And FWHT / XOR Convolution / Subset Convolution Only After Basic Bitmask DP¶
Some heavier techniques iterate over:
- all submasks of every mask
- zeta / mobius transforms
- SOS DP
Those belong to the next layer. First master:
dp[mask]dp[mask][last]- ordinary submask iteration
Worked Examples¶
Example 1: Assignment DP With dp[mask]¶
Suppose worker k must be assigned one unused job.
Let:
Then:
- if
maskalready containsj, jobjis unavailable - otherwise assign the next worker to job
j
So the transition is:
where
This is the key bitmask-DP lesson:
- the step number is not stored explicitly
- it is recovered from the subset size
Example 2: TSP / Held–Karp With dp[mask][last]¶
Let:
be the minimum cost of starting from a fixed source, visiting exactly the set mask, and ending at v.
Then the final edge into v must come from some u in
So:
This is the classical subset-DP recurrence. It replaces factorial search over visiting orders by one DP state per (subset, endpoint).
The important modeling lesson is:
- if you remove
last, the future transition cost is no longer determined - so
dp[mask]alone is too weak here
Example 3: Cover-Colors Or Keeper-State DP¶
In problems like VMMARBLE, the mask means:
- which colors already have at least one keeper
That is not a path problem. There is no meaningful last.
So the correct state is a pure coverage DP:
This is a good reminder that bitmask DP is a family of subset-state designs, not only TSP.
Example 4: Submask Partition DP¶
Sometimes you want to split a mask into:
- one chosen submask
- the rest
Then the canonical loop is:
for (sub = mask; sub; sub = (sub - 1) & mask)
This enumerates all nonempty submasks of mask.
It is powerful, but dangerous:
- inside an outer loop over all masks, it may cost
O(3^n)overall
So use it deliberately, not automatically.
Algorithm And Pseudocode¶
Repo starter template:
Generic Forward dp[mask]¶
dp[*] = INF
dp[0] = base
for mask from 0 to (1 << n) - 1:
if dp[mask] is unreachable:
continue
determine what the next choice means for this mask
for each item bit not in mask:
next_mask = mask | (1 << bit)
relax dp[next_mask] from dp[mask]
This increasing-mask order is valid in many forward subset DPs because every transition adds bits, so dependencies always come from smaller masks.
Generic Held–Karp Style dp[mask][last]¶
dp[*][*] = INF
dp[1 << start][start] = 0
for mask from 0 to (1 << n) - 1:
for last from 0 to n - 1:
if last not in mask:
continue
if dp[mask][last] is unreachable:
continue
for nxt from 0 to n - 1:
if nxt already in mask:
continue
next_mask = mask | (1 << nxt)
relax dp[next_mask][nxt]
For Hamiltonian path problems, the answer is usually:
For Hamiltonian cycle / tour problems such as classical TSP, you still need to close the loop:
Enumerating Submasks¶
sub = mask
while sub > 0:
use sub
sub = (sub - 1) & mask
This visits every nonzero submask exactly once.
Implementation Notes¶
1. 1 << n Needs Type Discipline¶
If n can approach 31, then 1 << n on signed int is a trap.
Contest-safe habits:
- use
1 << nonly whennis small and clearly safe - otherwise use
1LL << n
For typical bitmask DP limits around n <= 20, int masks are usually fine, but it is still worth being explicit.
Also remember the semantic distinction:
1 << bitmeans the singleton subset containingbit(1 << n) - 1means the full subset of allnitems
2. mask Meaning Must Be A Full Sentence¶
Before writing code, you should be able to say:
- "
maskmeans these jobs are already assigned" - "
maskmeans these cities are already visited" - "
maskmeans these colors are already covered"
If you cannot say that cleanly, your transition logic will drift.
3. Decide Whether You Add A Bit Or Remove A Bit¶
Both directions are valid:
- grow from smaller masks to larger masks
- or define a recurrence from
mask \setminus {bit}intomask
The important thing is consistency.
Forward transitions are usually easier to debug in contest.
4. popcount(mask) Is Often A Hidden State Variable¶
In assignment-style DP, the next row / worker / position is:
That is one of the most valuable bitmask-DP recognitions:
- the subset size may already encode the stage of the process
This is one reason increasing-mask iteration is so common:
- every transition goes from a state with
kchosen items to one withk + 1 - so no future state is read before it is written
5. Watch Total Complexity, Not Just State Count¶
Many people estimate only the number of states:
But the real cost is often:
This is why n = 22 may be fine for one pattern and dead for another.
The same warning applies to submask enumeration:
- one loop over all submasks of one fixed mask costs
O(2^{popcount(mask)}) - doing that inside a loop over all masks often costs
O(3^n)overall
6. Memory Compression Sometimes Works¶
If transitions only go from:
- smaller masks to larger masks
and you only need the previous layer by popcount, layer compression may be possible.
But do not force it too early. Clarity is usually worth more than shaving memory unless the constraints demand it.
7. Bitmask DP Versus Subset Sum Bitsets¶
Do not confuse these:
- bitmask DP: state is a subset of items
- bitset DP: state is usually a reachable value/set of sums packed into machine words
They both use bits, but they solve different shapes of problems.
8. Dense State Graphs May Want Another View¶
Sometimes (mask, node) is better understood as a graph state than as a table state.
If transitions are:
- unweighted
- uniform cost
- naturally graph-like
then BFS or shortest-path thinking may be the right wrapper around the same subset state.
9. Reverse Recurrences Still Need A Safe Fill Order¶
If your recurrence removes a bit, such as
then a plain increasing loop over mask is still often valid, because every proper submask is numerically smaller than mask.
That is a useful implementation fact, but do not rely on it blindly. Always ask:
- are all dependencies strict submasks?
- or does my recurrence also depend on another dimension whose order I still need to manage?
Beyond Basic Bitmask DP¶
The core contest layer is:
dp[mask]dp[mask][last]- adding one bit
- iterating submasks
Important next-layer directions include:
- SOS DP
- FWHT / XOR Convolution / Subset Convolution
- profile DP on grids
- meet-in-the-middle
- bitmask DP combined with shortest paths or BFS on state graphs
The right study order is:
- learn to define the mask in plain language
- choose between
dp[mask]anddp[mask][last] - internalize the feasibility of
n 2^nandn^2 2^n - only then move to heavier subset transforms
Practice Archetypes¶
The most common bitmask-DP-shaped tasks are:
- assignment / matching by used set
- visit-all-nodes path or tour
- cover all colors / categories / features
- partition a small set into valid groups
- small-state exponential search where many orders collapse to the same subset
The strongest contest smell is:
nis tiny- brute force over orders or subsets is almost feasible
- and the future depends only on a chosen subset plus maybe one small extra parameter
References And Repo Anchors¶
Research sweep refreshed on 2026-04-24.
Primary / classical landmark:
- Bellman–Held–Karp style subset DP for TSP, referenced in Held–Karp algorithm
Course:
- Stanford CS97SI: Dynamic Programming
- UCF Programming Team Lecture: DP Algorithm for Traveling Salesman Problem
- Illinois CS491 CAP: Travelling Salesperson
Reference:
Essay / notes:
Practice:
Repo anchors:
- practice ladder: Bitmask DP ladder
- practice note: VMMARBLE
- starter template: bitmask-subset-iterate.cpp
- notebook refresher: DP cheatsheet