Joint, Conditional, and Bayes
joint distribution, marginal distribution, conditional distribution, Bayes rule, independence
1 Role
This page is the bridge from single random quantities to relationships between random quantities.
It explains how to model two variables together, how to look at one variable after learning the other, and how Bayes’ rule turns evidence into posterior belief.
2 First-Pass Promise
Read this page after Expectation, Variance, Covariance.
If you stop here, you should still understand:
- what a joint distribution is
- how marginals and conditionals are extracted from it
- what independence means in distribution language
- why Bayes’ rule updates beliefs after observing evidence
3 Why It Matters
Modern probability and statistics almost never study one random object in isolation.
We care about questions like:
- what is the label given the feature?
- what is the disease state given the test result?
- what is the next state given the current one?
- what is the posterior belief after data arrives?
Those are all joint-and-conditional questions.
So this page is where probability starts to look like inference rather than just counting.
4 Prerequisite Recall
- a random variable maps outcomes to numbers
- a probability model can assign probabilities to events or values of random variables
- conditional probability already means
restrict the world and renormalize
5 Intuition
If one random variable tells part of the story, two random variables tell how two parts of the story interact.
The joint distribution is the full table of possibilities.
From that full table, you can:
- ignore one variable and keep only the other: this gives a marginal distribution
- freeze one variable and ask about the other: this gives a conditional distribution
Bayes’ rule is the most famous reversal move in this setting.
It lets you start from
how likely is the evidence if the hypothesis were true?
and turn that into
how likely is the hypothesis after seeing the evidence?
That reversal is one of the main engines of statistical inference.
6 Formal Core
Definition 1 (Definition) If \(X\) and \(Y\) are discrete random variables, the joint distribution is
\[ p_{X,Y}(x,y) = P(X=x, Y=y). \]
It records the probability of each pair of values occurring together.
Definition 2 (Marginal Distribution) The marginal distributions are obtained by summing out the other variable:
\[ p_X(x) = \sum_y p_{X,Y}(x,y), \qquad p_Y(y) = \sum_x p_{X,Y}(x,y). \]
So a marginal keeps one variable and ignores the other.
Definition 3 (Conditional Distribution) If \(p_Y(y) > 0\), the conditional distribution of \(X\) given \(Y=y\) is
\[ p_{X \mid Y}(x \mid y) = \frac{p_{X,Y}(x,y)}{p_Y(y)}. \]
This is the random-variable version of conditional probability.
Proposition 1 (Key Statement) If \(p_Y(y) > 0\), then
\[ p_{X \mid Y}(x \mid y) = \frac{p_{Y \mid X}(y \mid x)\,p_X(x)}{p_Y(y)}. \]
This is Bayes’ rule.
It combines:
- a prior term \(p_X(x)\)
- a likelihood term \(p_{Y \mid X}(y \mid x)\)
- an evidence term \(p_Y(y)\)
to produce a posterior term \(p_{X \mid Y}(x \mid y)\).
Proposition 2 (Independence) The variables \(X\) and \(Y\) are independent if
\[ p_{X,Y}(x,y) = p_X(x)\,p_Y(y) \]
for all \(x,y\).
So independence means the joint distribution factorizes into separate pieces.
7 Worked Example
Let \(D\) be a disease-status variable and \(T\) be a test-result variable:
- \(D=1\) means diseased, \(D=0\) means healthy
- \(T=1\) means test positive, \(T=0\) means test negative
Suppose the joint distribution is:
\[ \begin{array}{c|cc|c} & T=1 & T=0 & \text{row total} \\ \hline D=1 & 18/1000 & 2/1000 & 20/1000 \\ D=0 & 98/1000 & 882/1000 & 980/1000 \\ \hline \text{col total} & 116/1000 & 884/1000 & 1 \end{array} \]
This one table already contains several useful objects.
First, the marginal distribution of \(D\) is:
\[ P(D=1)=20/1000=0.02, \qquad P(D=0)=0.98. \]
So the disease prevalence is \(2\%\).
Next, the marginal distribution of \(T\) is:
\[ P(T=1)=116/1000=0.116, \qquad P(T=0)=0.884. \]
Now compute a conditional probability:
\[ P(T=1 \mid D=1) = \frac{P(T=1,D=1)}{P(D=1)} = \frac{18/1000}{20/1000} = 0.9. \]
So the test is positive with probability \(0.9\) among diseased individuals.
Bayes’ rule reverses the direction:
\[ P(D=1 \mid T=1) = \frac{P(T=1 \mid D=1)\,P(D=1)}{P(T=1)} = \frac{0.9 \cdot 0.02}{0.116} = \frac{18}{116} \approx 0.155. \]
So even after a positive test, the posterior probability of disease is only about 15.5%.
This example teaches the main point:
- the joint table is the full probabilistic object
- marginals summarize one variable at a time
- conditionals ask about one variable after learning the other
- Bayes’ rule can reverse the direction of conditioning
It also shows why base rates matter. A rare condition can still have a low posterior probability after a positive test if false positives are common enough.
8 Computation Lens
When working with two random variables, the main moves are:
- start from the joint distribution
- sum or integrate to get marginals
- divide by the relevant marginal to get conditionals
- factorize the joint when independence holds
- use Bayes’ rule when you need to reverse conditioning
That means a lot of probability work is really about moving between four views of the same object:
- joint
- marginal
- conditional
- posterior
Once that translation becomes automatic, many theorem statements become much easier to parse.
9 Application Lens
This topic sits under many practical workflows:
- in diagnostics: posterior disease probability given a test
- in classification: class probability given features
- in probabilistic modeling: latent variable inference given observations
- in Bayesian learning: prior to posterior updating after data arrives
So Bayes’ rule is not just a special identity. It is the mathematical shape of evidence-based updating.
10 Stop Here For First Pass
If you can now explain:
- what a joint distribution stores
- how to get marginals and conditional distributions from it
- what independence means as a factorization statement
- why Bayes’ rule turns likelihood and prior into posterior
then this page has done its main job.
11 Go Deeper
The next conceptual step in the planned probability spine is law-of-large-numbers-and-clt. Until that page exists, the best available bridge is Concentration and Common Inequalities, where probability starts to say how stable empirical quantities are.
12 Optional Paper Bridge
- MIT RES.6-012 conditioning and Bayes lecture materials -
Second pass- official MIT notes that frame Bayes’ rule as systematic evidence updating. Checked2026-04-24. - Statistics 110 About -
Paper bridge- a compact official overview showing how basics, Bayes, and multivariate distributions fit into one coherent probability course. Checked2026-04-24.
13 Optional After First Pass
If you want more practice before moving on:
- build a small joint table and compute all marginals and conditionals
- test whether two variables are independent from their joint table
- compare \(P(A \mid B)\) with \(P(B \mid A)\) in a real example
14 Common Mistakes
- confusing the joint distribution with either marginal distribution
- forgetting to normalize when forming a conditional distribution
- treating \(P(A \mid B)\) and \(P(B \mid A)\) as interchangeable
- assuming independence because two probabilities look numerically small
- ignoring the prior/base rate in Bayes calculations
15 Exercises
A joint distribution for two binary variables \(X\) and \(Y\) is given by
\[ P(X=1,Y=1)=0.2,\; P(X=1,Y=0)=0.3,\; P(X=0,Y=1)=0.1,\; P(X=0,Y=0)=0.4. \]
Find the marginal distributions of \(X\) and \(Y\).
Using the same table, compute \(P(X=1 \mid Y=1)\) and \(P(Y=1 \mid X=1)\).
Explain in words why a positive test does not automatically imply a high posterior probability of disease.
16 Sources and Further Reading
- Harvard Stat 110 -
First pass- strong official course hub covering Bayes, multivariate distributions, and probability as a modeling language. Checked2026-04-24. - Penn State STAT 414 -
First pass- current official notes with dedicated lessons on conditional probability, Bayes’ theorem, and conditional distributions. Checked2026-04-24. - CMU OLI Probability and Statistics -
Second pass- structured practice on event probability and probability rules from another teaching angle. Checked2026-04-24. - MIT RES.6-012 conditioning and Bayes lecture materials -
Second pass- official MIT reinforcement for conditioning, evidence, and Bayes’ rule. Checked2026-04-24. - Penn State STAT 414: Conditional Distributions -
Paper bridge- a good transition from event-based conditioning to conditional distributions of random variables. Checked2026-04-24.
Sources checked online on 2026-04-24:
- Harvard Stat 110 course overview
- Penn State STAT 414 overview
- Penn State lessons on conditional probability, Bayes’ theorem, and conditional distributions
- CMU OLI Probability and Statistics
- MIT RES.6-012 conditioning and Bayes lecture materials