Channel Coding, Capacity, and Converse Proofs
channel coding, capacity, converse, Fano’s inequality, reliable communication
1 Role
This is the fourth page of the Information Theory module.
Its job is to explain how information theory turns noisy communication into a precise limit statement:
- what it means to communicate reliably
- what the channel capacity is
- why rates below capacity are achievable
- why rates above capacity are impossible
This is the page where mutual information becomes an operational limit.
2 First-Pass Promise
Read this page after Typicality, Source Coding, and Compression Intuition.
If you stop here, you should still understand:
- what a coding rate is
- what “reliable communication” means
- why capacity is defined as a supremum over achievable rates
- why the channel coding theorem has a direct part and a converse part
3 Why It Matters
Compression asked:
how many bits are needed to describe a source?
Channel coding asks:
how many bits can survive a noisy channel reliably?
That question matters far beyond classical communication:
- sensing and telemetry
- storage with errors
- control with noisy observations
- lower-bound arguments in statistics and learning
- representation bottlenecks in modern ML
At a first pass:
- a channel corrupts transmitted symbols
- coding spreads information across many channel uses
- reliable communication is possible up to a certain rate
- that maximum rate is the capacity
- converse arguments say the limit is real, not just a weakness of current code constructions
This is the canonical example of an upper bound and a matching constructive lower bound meeting exactly.
4 Prerequisite Recall
- mutual information
I(X;Y)measures how much outputYreveals about inputX - typicality gives a way to reason about high-probability sequence structure
- entropy controlled compression rate in source coding
- data processing says that noise and post-processing cannot increase information
5 Intuition
5.1 Noise Destroys Symbol-Level Reliability
If a channel flips, erases, or perturbs symbols, then sending one raw symbol at a time is fragile.
The core trick of coding is:
do not protect each symbol separately; spread the message across a long block
That way, the receiver can use the whole output sequence to infer the original message.
5.2 Rate Measures Information Per Channel Use
If a code transmits one of M possible messages using n channel uses, its rate is roughly
\[ R=\frac{1}{n}\log M. \]
So the rate asks:
how much information do we communicate per use of the noisy channel?
5.3 Capacity Is The Reliable-Communication Threshold
Some rates are small enough that coding can overcome the noise.
Some rates are too ambitious: the channel simply does not carry enough information.
Capacity is the boundary between those two regimes.
For a memoryless channel, the remarkable fact is that this boundary can be written using mutual information:
\[ C = \max_{P_X} I(X;Y). \]
5.4 Direct And Converse Are Different Kinds Of Truth
The channel coding theorem has two halves:
direct part: rates below capacity can be achievedconverse part: rates above capacity cannot be achieved
The direct part is constructive in spirit.
The converse part is impossibility in spirit.
You need both to know that the capacity formula is the actual limit.
6 Formal Core
For this first pass, think of a discrete memoryless channel with transition law P_{Y|X}.
Definition 1 (Definition Idea: Achievable Rate) A rate R is achievable if for every small target error level, there exist sufficiently long block codes whose rate is at least R and whose decoding error probability goes to 0 as block length grows.
This definition is operational: it is about what can actually be communicated reliably.
Definition 2 (Definition: Channel Capacity) The channel capacity C is the supremum of all achievable rates.
So capacity is the largest rate at which reliable communication is possible.
Theorem 1 (Theorem Idea: Capacity Formula For A Memoryless Channel) For a discrete memoryless channel,
\[ C=\max_{P_X} I(X;Y). \]
This is the central operational meaning of mutual information.
Theorem 2 (Theorem Idea: Direct Part) If R<C, then there exist coding schemes with block length n whose decoding error probability goes to 0.
At a first pass, the proof idea is:
- draw a random codebook
- use joint typicality or related decoding
- show the error probability vanishes when the rate is below mutual-information threshold
Theorem 3 (Theorem Idea: Converse Part) If R>C, then no sequence of codes can make the decoding error probability go to 0.
At a first pass, the proof idea is:
- if the receiver decodes well, the output must retain enough information about the message
- Fano-style inequalities convert small decoding error into an information lower bound
- data processing and single-letter bounds show the channel cannot support more than capacity
Theorem 4 (Theorem Idea: Fano’s Inequality As A Converse Tool) Fano’s inequality relates decoding error probability to conditional entropy of the message given the observation.
At first pass, keep only the role:
Fano turns probability of error into an information inequality
That is why it appears so often in converse proofs and statistical lower bounds.
7 Worked Example
Consider a binary symmetric channel BSC(p):
- input alphabet
{0,1} - output alphabet
{0,1} - each transmitted bit flips with probability
p
If you send uncoded bits, the channel flips them independently, so reliability is poor unless p is tiny.
But with long block codes, reliable communication is possible up to capacity
\[ C = 1 - H_2(p), \]
where H_2 is binary entropy.
What matters at first pass is the shape:
- if
p=0, the channel is noiseless andC=1 - if
p=1/2, the output is pure noise independent of the input andC=0 - for intermediate
p, capacity lies strictly between0and1
So capacity matches the intuitive quality of the channel, but gives an exact rate threshold.
8 Computation Lens
When a paper mentions capacity or a converse proof, ask:
- what is the channel or observation model?
- what is the input distribution, and is the key quantity
I(X;Y)optimized over it? - is the result an achievability statement, a converse statement, or both?
- where does Fano’s inequality or data processing enter?
- is the theorem single-letter, or is extra work needed to reduce a block problem to one-letter quantities?
Those questions usually reveal whether the proof is constructive, impossibility-based, or a mix of both.
9 Application Lens
9.1 Communication Systems
This is the classical home of the theorem: coding over noisy channels with an exact reliability threshold.
9.2 Statistics And Learning Lower Bounds
Fano-style converses and information constraints reappear when proving that no estimator or learner can perform too well under limited data or noisy observations.
9.3 Control, Sensing, And Feedback
Whenever a controller or estimator receives information through a noisy pipeline, capacity-like reasoning becomes a natural upper bound on what the system can possibly recover or stabilize.
10 Stop Here For First Pass
If you stop here, retain these five ideas:
- coding rate measures information per channel use
- capacity is the maximum reliable communication rate
- for memoryless channels, capacity is characterized by maximizing mutual information
- the direct part proves rates below capacity are achievable
- the converse part proves rates above capacity are impossible
That is enough to read most first-pass channel-coding statements without getting lost in proof detail.
11 Go Deeper
The next natural step in this module is:
The strongest adjacent live pages right now are:
12 Optional Deeper Reading After First Pass
If you want a stronger second pass on the same ideas, use:
- MIT 6.441 Chapter 14: Channel Coding for the cleanest official MIT entry into channel coding. Checked
2026-04-25. - Stanford EE376A course outline to see how capacity, converse, and achievability are staged in the course. Checked
2026-04-25. - Stanford EE376A lecture notes for the full official course treatment. Checked
2026-04-25. - Stanford EE376A lecture 8 for capacity and examples. Checked
2026-04-25. - Stanford EE376A lecture 11 for the converse viewpoint. Checked
2026-04-25.
13 Sources and Further Reading
- MIT 6.441: Information Theory -
First pass- official course page for the overall coding-theorem structure of the field. Checked2026-04-25. - MIT 6.441 Chapter 14: Channel Coding -
First pass- official MIT chapter specifically for the channel-coding theorem. Checked2026-04-25. - Stanford EE376A course outline -
First pass- official outline showing the capacity, converse, and achievability sequence. Checked2026-04-25. - Stanford EE376A lecture notes -
Second pass- official full notes containing the channel-coding theorem and examples. Checked2026-04-25. - Stanford EE376A lecture 8 -
Second pass- official notes focused on channel capacity and examples. Checked2026-04-25. - Stanford EE376A lecture 11 -
Second pass- official notes focused on the converse part. Checked2026-04-25.