Supervised Learning, Losses, and Empirical Risk

A bridge page showing how prediction tasks, loss functions, and empirical risk connect the site’s math foundations to the basic language of machine learning.

Modified

April 26, 2026

Keywords

supervised learning, loss, empirical risk, generalization, optimization

1 Application Snapshot

Most of modern machine learning can be described in one compact sentence:

choose a model, measure its prediction error with a loss, and try to make the average loss small on data

That average loss is the empirical risk.

This page is the shortest bridge from the site’s current math foundations into the language used in ML courses, papers, and benchmarks.

2 Problem Setting

In supervised learning, we observe examples

\[ (x_1,y_1), \dots, (x_n,y_n), \]

where:

\(x_i\) is an input
\(y_i\) is the target or label

We choose a predictor \(f\) from some model class and measure how well it predicts \(y_i\) from \(x_i\).

A loss function

\[ \ell(f(x),y) \]

assigns a numerical penalty to a prediction-target pair.

The empirical risk of \(f\) on the sample is

\[ \hat{R}_n(f) = \frac{1}{n}\sum_{i=1}^n \ell(f(x_i), y_i). \]

The training problem is then:

\[ \min_f \hat{R}_n(f) \]

possibly with regularization or constraints.

3 Why This Math Appears

This language reuses several math layers already on the site:

Probability: data are modeled as random draws from some distribution
Statistics: training loss is a sample-based quantity, so estimation and validation matter
Linear Algebra: many predictors are linear maps, projections, or low-rank approximations
Proofs + Logic: theory papers formalize guarantees using quantified assumptions and implications

So empirical risk is not a separate “ML trick.” It is the meeting point where modeling, optimization, probability, and statistics all touch the same object.

4 Math Objects In Use

sample of input-target pairs \((x_i,y_i)\)
predictor \(f\)
loss function \(\ell\)
empirical risk \(\hat{R}_n(f)\)
sometimes a regularized objective such as

\[ \hat{R}_n(f) + \lambda \Omega(f) \]

5 A Small Worked Walkthrough

Suppose we fit a linear predictor

\[ f_\beta(x) = x^\top \beta \]

for a regression task and use squared loss:

\[ \ell(f_\beta(x_i), y_i) = (x_i^\top \beta - y_i)^2. \]

Then the empirical risk becomes

\[ \hat{R}_n(\beta) = \frac{1}{n}\sum_{i=1}^n (x_i^\top \beta - y_i)^2. \]

Up to the scaling factor \(1/n\), this is the same least-squares objective you already saw in Linear Regression Through Projection.

That is why least squares is such a good first ML bridge:

it is already empirical risk minimization
the loss is explicit
the math object is familiar
later models mostly change the predictor class, the loss, or the regularizer

6 Implementation or Computation Note

Empirical risk is the starting object, not the end of the story.

In practice, once the objective is written down, the workflow immediately branches into three operational questions:

Optimization How do we actually minimize the objective?
Validation How do we choose models and hyperparameters without fooling ourselves?
Generalization Why should low sample loss say anything about future data?

That is why this page is the first bridge into the rest of the ML section rather than a self-contained stopping point.

Use these pages as the strongest follow-on support:

7 Failure Modes

treating empirical risk as if it were the true population objective
forgetting that the loss function is a modeling choice, not a law of nature
thinking lower training loss automatically means better generalization
confusing the model class with the optimization method used to fit it
talking about accuracy when the task is actually a regression task with a different loss

8 Paper Bridge

CS229: Machine Learning - First pass - official Stanford entry point where supervised learning and loss-based objectives are the default language from the beginning. Checked 2026-04-24.
CS229T / Statistical Learning Theory - Paper bridge - a direct continuation once you want to see empirical risk and ERM turned into formal generalization questions. Checked 2026-04-24.

9 Sources and Further Reading

CS229: Machine Learning - First pass - official current ML course hub where supervised learning and loss-based training are central. Checked 2026-04-24.
CS 189 Syllabus - First pass - official Berkeley course framing the full ML pipeline from problem setup to model design and optimization. Checked 2026-04-24.
Mathematics for Machine Learning - Second pass - useful bridge for readers translating familiar math objects into ML notation. Checked 2026-04-24.
CS229T / Statistical Learning Theory - Paper bridge - points from empirical risk language toward formal learning-theory analysis. Checked 2026-04-24.