Regression and Classification Basics

How supervised statistical modeling splits into regression for quantitative responses and classification for categorical responses, and how both turn predictors into useful predictions with different loss and evaluation ideas.

Modified

April 26, 2026

Keywords

regression, classification, supervised learning, least squares, logistic regression

1 Role

This page is the first modeling page in the statistics module.

Its job is to show how statistical inference grows into supervised prediction: when the response is quantitative, we do regression; when the response is categorical, we do classification.

2 First-Pass Promise

Read this page after Confidence Intervals and Hypothesis Testing.

If you stop here, you should still understand:

the structural difference between regression and classification
what predictors, responses, and fitted functions are doing
why least squares is natural for regression
why classification is about estimated class probabilities and decision rules, not just labels

3 Why It Matters

A huge amount of modern applied statistics and ML is just this distinction repeated in richer settings.

Typical questions look like:

Regression: predict a real-valued quantity like latency, energy use, temperature, score, or cost
Classification: predict a category like spam/not spam, disease/no disease, fail/pass, or class label

Once you see that response-type split clearly, many methods become easier to place:

linear regression, ridge, and least squares live on the regression side
logistic regression, discriminant analysis, and nearest-neighbor classification live on the classification side

This page is meant to make that landscape legible before the methods get more elaborate.

4 Prerequisite Recall

a dataset has observational units, predictors, and sometimes a response variable
confidence intervals and tests quantify uncertainty about parameters or effects
likelihood and Bayesian methods are two ways to fit or reason about statistical models

5 Intuition

Supervised modeling starts with paired information:

\[ (x_i, y_i) \quad \text{or} \quad (x_i, g_i). \]

The predictor vector \(x_i\) contains the observed features or covariates.

The response can take two different forms:

quantitative: a real-valued outcome such as time, price, or score
categorical: a label such as yes/no or one of several classes

That one distinction drives the rest:

in regression, we care about how close the numerical predictions are
in classification, we care about assigning the right label or at least high probability to the right class

So the modeling question is not “which algorithm do I like?” It is first:

what type of response am I trying to predict?

6 Formal Core

Definition 1 (Supervised Learning Setup) In supervised learning, we observe training data with predictors and responses: \[ \{(x_i, y_i)\}_{i=1}^n \qquad \text{or} \qquad \{(x_i, g_i)\}_{i=1}^n. \]

The goal is to build a fitted rule that predicts the response from the predictors.

Definition 2 (Regression Versus Classification)

Regression: the response \(Y\) is quantitative, and we model or predict a numerical value
Classification: the response \(G\) is categorical, and we predict a class label or class probabilities

This is the main first-pass distinction in supervised statistical modeling.

Proposition 1 (Basic Modeling Rules) Two first-pass models to remember are:

linear regression for a quantitative response: \[ Y \approx \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p + \varepsilon \]
logistic-style classification for a binary response: \[ \Pr(G=1 \mid X=x) = \frac{1}{1+\exp\big(-(\beta_0+\beta_1x_1+\cdots+\beta_p x_p)\big)} \]

In regression, least squares is a natural fitting criterion.

In classification, estimated class probabilities are turned into labels by a decision rule such as choosing the larger posterior probability.

7 Worked Example

Suppose a team collects data on app sessions.

For each session they record:

x: session length in minutes
y: number of dollars spent in the session
g: whether the user made a purchase (1) or not (0)

The same predictors can support two different tasks.

7.1 Regression View

If the goal is to predict dollars spent, the response is quantitative, so this is a regression problem.

A simple linear model might be \[ \widehat{y} = \beta_0 + \beta_1 x. \]

If the fitted line is \[ \widehat{y} = -1 + 0.8x, \] then for a 10-minute session the predicted spend is \[ -1 + 0.8(10) = 7. \]

The residual for a session is \[ e_i = y_i - \widehat{y}_i, \] which tells us how far the observed value is from the prediction.

Least squares chooses the coefficients to make the total squared residual error \[ \sum_{i=1}^n (y_i - \widehat{y}_i)^2 \] as small as possible.

7.2 Classification View

If the goal is instead to predict whether a purchase happens, the response is categorical, so this is a classification problem.

A logistic-style model might estimate \[ \Pr(G=1 \mid x) = \frac{1}{1+\exp(-(-2 + 0.3x))}. \]

At \(x=10\) minutes, this gives \[ \Pr(G=1 \mid x=10) = \frac{1}{1+\exp(-1)} \approx 0.731. \]

If we use the usual 0.5 cutoff, the prediction is class 1: purchase.

So the same predictor variable supports two different statistical tasks:

predict how much -> regression
predict which class -> classification

That is the core structural lesson.

8 Computation Lens

A practical first-pass workflow is:

identify whether the response is quantitative or categorical
separate predictors from the response
choose a model family matched to the response type
fit the model
examine an appropriate error notion

Common first-pass error summaries are:

regression: residuals, squared error, mean squared error
classification: misclassification rate, confusion matrix, estimated class probabilities

This is also where train/test splitting or validation starts to matter, because a good fit on training data does not automatically mean good predictive performance.

9 Application Lens

This page is a direct bridge into ML:

regression appears in forecasting, score prediction, calibration, and control-oriented modeling
classification appears in detection, triage, recommendation, diagnosis, moderation, and document labeling
logistic regression is both a classical statistics tool and a very standard ML baseline
least-squares regression is both a statistics method and a building block for optimization and linear modeling

So this is one of the places where the line between “statistics” and “machine learning” becomes very thin.

10 Stop Here For First Pass

If you can now explain:

how regression differs from classification
why the type of response variable determines the task
why least squares is natural in regression
why classification should often be thought of through probabilities and decision rules

then this page has done its main job.

11 Go Deeper

The most useful next steps after this page are:

Experimental Design and Model Evaluation, to learn how to assess predictive performance responsibly
Confidence Intervals and Hypothesis Testing if you want to connect fitted model parameters to interval/test reasoning
Maximum Likelihood and Bayesian Basics if you want to see logistic regression and probabilistic modeling through a likelihood lens

12 Optional Paper Bridge

Penn State STAT 508 Lesson 1: Introduction to Data Mining - First pass - clear official introduction to supervised learning and the regression/classification split. Checked 2026-04-24.
MIT 18.05 Linear Regression - First pass - official MIT notes on least-squares regression and model assumptions. Checked 2026-04-24.
Penn State STAT 508 Lesson 10: Classification - Second pass - official source for logistic regression and classification error ideas. Checked 2026-04-24.
Penn State STAT 501 course notes - Second pass - strong official bridge from basic fitted lines to broader regression workflow and diagnostics. Checked 2026-04-24.

13 Optional After First Pass

If you want more practice before moving on:

take one dataset idea and ask whether the response makes it a regression or classification task
write down one possible loss or error notion for each task
compare predicting a probability with predicting a final hard class label

14 Common Mistakes

choosing the method before identifying the response type
treating classification labels as if they were ordinary numeric outcomes
using only training error to judge model quality
ignoring predicted probabilities in classification and looking only at hard labels
forgetting that regression assumptions and classification decision rules are different objects

15 Exercises

A model predicts house price from square footage and neighborhood features. Is this regression or classification? Why?
A model predicts whether an email is spam or not. Is this regression or classification? What is the natural response type?
In one sentence, explain why least squares makes sense for regression but not for a yes/no class label as the final task.

16 Sources and Further Reading

Penn State STAT 508 Lesson 1: Introduction to Data Mining - First pass - official introduction to regression versus classification and supervised learning language. Checked 2026-04-24.
MIT 18.05 Linear Regression - First pass - official MIT treatment of least-squares regression and regression assumptions. Checked 2026-04-24.
Penn State STAT 508 Lesson 10: Classification - Second pass - official lesson on classification, logistic regression, and classification error. Checked 2026-04-24.
Penn State STAT 501 course notes - Second pass - official regression-methods notes that make the transition from toy lines to real modeling workflow. Checked 2026-04-24.

Sources checked online on 2026-04-24:

Penn State STAT 508 Lesson 1
MIT 18.05 Linear Regression
Penn State STAT 508 Lesson 10
Penn State STAT 501 course notes