Huffman Coding Benchmark¶

Title: Huffman Coding Benchmark
Judge / source: Canonical greedy prefix-code benchmark
Original URL: https://algs4.cs.princeton.edu/code/javadoc/edu/princeton/cs/algs4/Huffman.html
Secondary topics: Optimal merge, Prefix codes, Weighted path length
Difficulty: medium
Subtype: Binary Huffman coding with deterministic tie-breaks
Status: solved
Solution file: huffmancoding.cpp

Why Practice This¶

This is the cleanest first in-repo flagship for Huffman / Data Compression.

The benchmark is intentionally narrow:

So the hard part is exactly the lane itself:

Reach for the Huffman worldview when:

The strongest smell is:

That is exactly this lane.

This benchmark does not want:

The clean route is:

create one leaf per symbol
keep the forest in a min-heap ordered by (weight, minimum original symbol index)
repeatedly merge the two smallest trees
accumulate weighted path length through merge sums
DFS the final tree to recover one deterministic code per symbol

That is exactly the first Huffman route.

The key monotone fact is:

every merge of subtrees with weights a and b increases the final answer by exactly a + b

So the entire optimum can be tracked as:

sum of all merge weights

That lets you reason about the objective without carrying all leaf depths by hand at every step.

With n symbols:

The point of this benchmark is not to mimic a file compressor. The point is:

This repo's canonical benchmark uses:

The solution prints:

Tie policy in this repo:

The real invariant is the minimum weighted cost, not the prettiness of one code assignment.
Different valid tie policies can produce different optimal codes.
This repo fixes one deterministic tie-break so outputs stay stable.
The lane is about the coding tree, not about bitstream serialization or decompression headers.