Joint, Conditional, and Bayes

How probability tracks two random quantities together, derives marginals and conditionals, and reverses the direction of conditioning through Bayes’ rule.

Modified

April 26, 2026

Keywords

joint distribution, marginal distribution, conditional distribution, Bayes rule, independence

1 Role

This page is the bridge from single random quantities to relationships between random quantities.

It explains how to model two variables together, how to look at one variable after learning the other, and how Bayes’ rule turns evidence into posterior belief.

2 First-Pass Promise

Read this page after Expectation, Variance, Covariance.

If you stop here, you should still understand:

what a joint distribution is
how marginals and conditionals are extracted from it
what independence means in distribution language
why Bayes’ rule updates beliefs after observing evidence

3 Why It Matters

Modern probability and statistics almost never study one random object in isolation.

We care about questions like:

what is the label given the feature?
what is the disease state given the test result?
what is the next state given the current one?
what is the posterior belief after data arrives?

Those are all joint-and-conditional questions.

So this page is where probability starts to look like inference rather than just counting.

4 Prerequisite Recall

a random variable maps outcomes to numbers
a probability model can assign probabilities to events or values of random variables
conditional probability already means restrict the world and renormalize

5 Intuition

If one random variable tells part of the story, two random variables tell how two parts of the story interact.

The joint distribution is the full table of possibilities.

From that full table, you can:

ignore one variable and keep only the other: this gives a marginal distribution
freeze one variable and ask about the other: this gives a conditional distribution

Bayes’ rule is the most famous reversal move in this setting.

It lets you start from

how likely is the evidence if the hypothesis were true?

and turn that into

how likely is the hypothesis after seeing the evidence?

That reversal is one of the main engines of statistical inference.

6 Formal Core

Definition 1 (Definition) If \(X\) and \(Y\) are discrete random variables, the joint distribution is

\[ p_{X,Y}(x,y) = P(X=x, Y=y). \]

It records the probability of each pair of values occurring together.

Definition 2 (Marginal Distribution) The marginal distributions are obtained by summing out the other variable:

\[ p_X(x) = \sum_y p_{X,Y}(x,y), \qquad p_Y(y) = \sum_x p_{X,Y}(x,y). \]

So a marginal keeps one variable and ignores the other.

Definition 3 (Conditional Distribution) If \(p_Y(y) > 0\), the conditional distribution of \(X\) given \(Y=y\) is

\[ p_{X \mid Y}(x \mid y) = \frac{p_{X,Y}(x,y)}{p_Y(y)}. \]

This is the random-variable version of conditional probability.

Proposition 1 (Key Statement) If \(p_Y(y) > 0\), then

\[ p_{X \mid Y}(x \mid y) = \frac{p_{Y \mid X}(y \mid x)\,p_X(x)}{p_Y(y)}. \]

This is Bayes’ rule.

It combines:

a prior term \(p_X(x)\)
a likelihood term \(p_{Y \mid X}(y \mid x)\)
an evidence term \(p_Y(y)\)

to produce a posterior term \(p_{X \mid Y}(x \mid y)\).

Proposition 2 (Independence) The variables \(X\) and \(Y\) are independent if

\[ p_{X,Y}(x,y) = p_X(x)\,p_Y(y) \]

for all \(x,y\).

So independence means the joint distribution factorizes into separate pieces.

7 Worked Example

Let \(D\) be a disease-status variable and \(T\) be a test-result variable:

\(D=1\) means diseased, \(D=0\) means healthy
\(T=1\) means test positive, \(T=0\) means test negative

Suppose the joint distribution is:

\[ \begin{array}{c|cc|c} & T=1 & T=0 & \text{row total} \\ \hline D=1 & 18/1000 & 2/1000 & 20/1000 \\ D=0 & 98/1000 & 882/1000 & 980/1000 \\ \hline \text{col total} & 116/1000 & 884/1000 & 1 \end{array} \]

This one table already contains several useful objects.

First, the marginal distribution of \(D\) is:

\[ P(D=1)=20/1000=0.02, \qquad P(D=0)=0.98. \]

So the disease prevalence is \(2\%\).

Next, the marginal distribution of \(T\) is:

\[ P(T=1)=116/1000=0.116, \qquad P(T=0)=0.884. \]

Now compute a conditional probability:

\[ P(T=1 \mid D=1) = \frac{P(T=1,D=1)}{P(D=1)} = \frac{18/1000}{20/1000} = 0.9. \]

So the test is positive with probability \(0.9\) among diseased individuals.

Bayes’ rule reverses the direction:

\[ P(D=1 \mid T=1) = \frac{P(T=1 \mid D=1)\,P(D=1)}{P(T=1)} = \frac{0.9 \cdot 0.02}{0.116} = \frac{18}{116} \approx 0.155. \]

So even after a positive test, the posterior probability of disease is only about 15.5%.

This example teaches the main point:

the joint table is the full probabilistic object
marginals summarize one variable at a time
conditionals ask about one variable after learning the other
Bayes’ rule can reverse the direction of conditioning

It also shows why base rates matter. A rare condition can still have a low posterior probability after a positive test if false positives are common enough.

8 Computation Lens

When working with two random variables, the main moves are:

start from the joint distribution
sum or integrate to get marginals
divide by the relevant marginal to get conditionals
factorize the joint when independence holds
use Bayes’ rule when you need to reverse conditioning

That means a lot of probability work is really about moving between four views of the same object:

joint
marginal
conditional
posterior

Once that translation becomes automatic, many theorem statements become much easier to parse.

9 Application Lens

This topic sits under many practical workflows:

in diagnostics: posterior disease probability given a test
in classification: class probability given features
in probabilistic modeling: latent variable inference given observations
in Bayesian learning: prior to posterior updating after data arrives

So Bayes’ rule is not just a special identity. It is the mathematical shape of evidence-based updating.

10 Stop Here For First Pass

If you can now explain:

what a joint distribution stores
how to get marginals and conditional distributions from it
what independence means as a factorization statement
why Bayes’ rule turns likelihood and prior into posterior

then this page has done its main job.

11 Go Deeper

The next conceptual step in the planned probability spine is law-of-large-numbers-and-clt. Until that page exists, the best available bridge is Concentration and Common Inequalities, where probability starts to say how stable empirical quantities are.

12 Optional Paper Bridge

MIT RES.6-012 conditioning and Bayes lecture materials - Second pass - official MIT notes that frame Bayes’ rule as systematic evidence updating. Checked 2026-04-24.
Statistics 110 About - Paper bridge - a compact official overview showing how basics, Bayes, and multivariate distributions fit into one coherent probability course. Checked 2026-04-24.

13 Optional After First Pass

If you want more practice before moving on:

build a small joint table and compute all marginals and conditionals
test whether two variables are independent from their joint table
compare \(P(A \mid B)\) with \(P(B \mid A)\) in a real example

14 Common Mistakes

confusing the joint distribution with either marginal distribution
forgetting to normalize when forming a conditional distribution
treating \(P(A \mid B)\) and \(P(B \mid A)\) as interchangeable
assuming independence because two probabilities look numerically small
ignoring the prior/base rate in Bayes calculations

15 Exercises

A joint distribution for two binary variables \(X\) and \(Y\) is given by

\[ P(X=1,Y=1)=0.2,\; P(X=1,Y=0)=0.3,\; P(X=0,Y=1)=0.1,\; P(X=0,Y=0)=0.4. \]

Find the marginal distributions of \(X\) and \(Y\).
Using the same table, compute \(P(X=1 \mid Y=1)\) and \(P(Y=1 \mid X=1)\).
Explain in words why a positive test does not automatically imply a high posterior probability of disease.

16 Sources and Further Reading

Harvard Stat 110 - First pass - strong official course hub covering Bayes, multivariate distributions, and probability as a modeling language. Checked 2026-04-24.
Penn State STAT 414 - First pass - current official notes with dedicated lessons on conditional probability, Bayes’ theorem, and conditional distributions. Checked 2026-04-24.
CMU OLI Probability and Statistics - Second pass - structured practice on event probability and probability rules from another teaching angle. Checked 2026-04-24.
MIT RES.6-012 conditioning and Bayes lecture materials - Second pass - official MIT reinforcement for conditioning, evidence, and Bayes’ rule. Checked 2026-04-24.
Penn State STAT 414: Conditional Distributions - Paper bridge - a good transition from event-based conditioning to conditional distributions of random variables. Checked 2026-04-24.

Sources checked online on 2026-04-24:

Harvard Stat 110 course overview
Penn State STAT 414 overview
Penn State lessons on conditional probability, Bayes’ theorem, and conditional distributions
CMU OLI Probability and Statistics
MIT RES.6-012 conditioning and Bayes lecture materials