Skip to content

Huffman / Data Compression Hot Sheet

Use this page when the real question is "build one optimal binary prefix-code tree by repeatedly merging the two lightest subtrees."

Do Not Use When

  • the task is one scan with prefix feasibility or repair -> Prefix Constraints
  • the point is a production compression format, bit packing, or container headers
  • the task only needs a priority queue, not a proof of optimal merge structure

Choose By Signal

  • one optimal merge / prefix-code tree from positive frequencies -> Huffman
  • one heap-backed feasibility repair over ordered input -> Prefix Constraints
  • one plain priority-queue process without coding-tree semantics -> reopen Heaps And Ordered Sets

Core Invariants

  • the heap stores the current forest of subtree weights
  • every merge increases total weighted path length by exactly a + b
  • total optimum equals the sum of all merge costs
  • one-symbol input needs an explicit empty-code convention

Main Traps

  • confusing code assignment tie policy with the actual optimality invariant
  • trying to teach LZW / run-length / file-format details in the first Huffman lane
  • forgetting that the objective is weighted depth, not lexicographic prettiness of the codewords

Exact Starter Route

Reopen Paths