Matrices and Linear Maps

Why a matrix is best understood as the coordinate form of a linear map, and why matrix multiplication means composition of transformations.
Modified

April 26, 2026

Keywords

matrices, linear maps, linear transformations, matrix multiplication, composition, standard basis

1 Role

This page is where linear algebra shifts from what vectors can be built to what rules transform vectors.

It explains why matrices are not just rectangular arrays of numbers. A matrix is the coordinate description of a linear map, and matrix multiplication is the algebra of composing those maps.

2 First-Pass Promise

Read this page after Vectors and Linear Combinations.

If you stop here, you should still understand:

  • what a linear map is
  • why every matrix gives a linear map
  • how the columns of a matrix record what happens to basis vectors
  • why multiplying matrices means composing linear transformations

3 Why It Matters

This topic matters because many later ideas are really statements about operators, not just arrays:

  • least squares studies the map \(x \mapsto Ax\) and asks which outputs are reachable or closest
  • eigenvalues describe directions that a linear map stretches without turning
  • SVD analyzes a linear map into orthogonal input and output directions
  • numerical methods care about how a map amplifies error
  • machine learning layers repeatedly apply learned matrix maps to features and embeddings

So this page gives the first real operator viewpoint of the module.

4 Prerequisite Recall

  • a vector can be written in coordinates relative to a basis
  • if \(A\) has columns \(a_1,\dots,a_n\), then \(Ax = x_1 a_1 + \cdots + x_n a_n\)
  • linear combinations preserve scaling and addition

5 Intuition

The easiest way to misuse matrices is to treat them as static data tables.

The better viewpoint is dynamic: a matrix tells you how an input vector is transformed into an output vector.

Some maps stretch space. Some shear it. Some rotate it. Some project onto a line or plane. Some send features into a new representation space. The matrix is how we compute that action once coordinates have been chosen.

This is why the columns matter so much. If you know what a linear map does to the basis vectors, then linearity tells you what it does to every vector built from them.

That is the main idea of the page:

a matrix is determined by where it sends the basis vectors

and

matrix multiplication is what happens when you do one linear map after another.

6 Formal Core

Definition 1 (Definition) Let \(T : \mathbb{R}^n \to \mathbb{R}^m\).

We say \(T\) is a linear map if for all vectors \(u,v \in \mathbb{R}^n\) and all scalars \(c \in \mathbb{R}\),

\[ T(u+v) = T(u) + T(v) \qquad \text{and} \qquad T(cu) = cT(u). \]

So a linear map preserves vector addition and scalar multiplication.

Proposition 1 (Key Statement) If \(A\) is an \(m \times n\) matrix, then the rule

\[ T(x) = Ax \]

defines a linear map from \(\mathbb{R}^n\) to \(\mathbb{R}^m\).

Conversely, once a basis \(B\) is chosen for the domain and a basis \(C\) is chosen for the codomain, every linear map \(T : \mathbb{R}^n \to \mathbb{R}^m\) can be represented by a matrix \(A\) such that

\[ [T(x)]_C = A[x]_B. \]

In the standard bases, this becomes the familiar formula \(T(x)=Ax\).

Proposition 2 (Columns Record Basis Images) Let \(e_1,\dots,e_n\) be the standard basis vectors in \(\mathbb{R}^n\).

If \(T(x) = Ax\), then the \(j\)th column of \(A\) is exactly \(T(e_j)\).

So

\[ A = \begin{bmatrix} \vert & & \vert \\ T(e_1) & \cdots & T(e_n) \\ \vert & & \vert \end{bmatrix}. \]

This is why knowing the image of the basis vectors determines the whole map.

Proposition 3 (Composition Becomes Matrix Multiplication) If \(S(x) = Bx\) and \(T(y) = Ay\), then applying \(S\) first and then \(T\) gives

\[ (T \circ S)(x) = A(Bx) = (AB)x. \]

So matrix multiplication is the coordinate form of composition of linear maps.

7 Worked Example

Define a map \(T : \mathbb{R}^2 \to \mathbb{R}^2\) by

\[ T(x,y) = (x + 2y,\; y). \]

This is a shear: it keeps the second coordinate fixed and tilts the first coordinate by adding twice the second.

First check linearity:

\[ T\big((x_1,y_1) + (x_2,y_2)\big) = T(x_1+x_2,\; y_1+y_2) = (x_1+x_2 + 2(y_1+y_2),\; y_1+y_2), \]

which is the same as

\[ T(x_1,y_1) + T(x_2,y_2). \]

Scaling works the same way, so \(T\) is linear.

Now look at the standard basis vectors:

\[ e_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \qquad e_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}. \]

Then

\[ T(e_1) = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \qquad T(e_2) = \begin{bmatrix} 2 \\ 1 \end{bmatrix}. \]

So the matrix of \(T\) in the standard basis is

\[ A = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}. \]

Now take the vector

\[ x = \begin{bmatrix} 3 \\ -1 \end{bmatrix}. \]

Because \(x = 3e_1 - e_2\), linearity says

\[ T(x) = 3T(e_1) - T(e_2) = 3 \begin{bmatrix} 1 \\ 0 \end{bmatrix} - \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}. \]

And matrix multiplication gives the same answer:

\[ Ax = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 3 \\ -1 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}. \]

This example teaches the main story in one place:

  1. the map is defined geometrically
  2. the columns come from what happens to basis vectors
  3. the matrix computes the same transformation in coordinates

8 Computation Lens

Once a basis is fixed, computation with linear maps becomes matrix computation.

That is why so many later tasks reduce to matrix algebra:

  • applying a linear operator means multiplying by a matrix
  • composing operators means multiplying matrices
  • solving for an input means solving a system involving the matrix
  • asking whether a map is invertible becomes a question about the matrix

This is also why the order in matrix multiplication matters.

If you apply \(S\) and then \(T\), the combined map is \(T \circ S\), which becomes \(AB\), not \(BA\). The order of matrices follows the order of function composition from right to left.

9 Application Lens

A learned dense layer in machine learning applies a matrix map to a feature vector.

If the layer uses weights \(W\), then the linear core is

\[ x \mapsto Wx. \]

In practice many models add a bias term and then a nonlinearity, but the matrix part is still the operator that mixes coordinates, projects to a new feature space, or changes representation dimension.

So when you study matrices as linear maps, you are learning the mathematical core behind feature mixing, embeddings, and many layers used throughout ML systems.

10 Stop Here For First Pass

If you can now explain:

  • what makes a map linear
  • why the columns of a matrix are the images of basis vectors
  • why \(T(x)=Ax\) is more than notation
  • why matrix multiplication means composition

then this page has done its main job.

11 Go Deeper

If you want more after the main page:

12 Optional Paper Bridge

13 Optional After First Pass

If you want more practice before moving on:

  • compute the matrix of a map by sending each basis vector through it
  • check whether a given rule really preserves addition and scaling
  • compose two simple transformations and compare the result with matrix multiplication
  • continue to Subspaces, Basis, and Dimension once the operator viewpoint feels stable

14 Common Mistakes

  • treating a matrix as only a table of numbers rather than a rule acting on vectors
  • forgetting that linear maps must preserve both addition and scalar multiplication
  • thinking the matrix columns are arbitrary instead of images of basis vectors
  • reading \(AB\) left to right as “do \(A\) first, then \(B\)
  • confusing linear maps with affine maps that include translations or bias terms

15 Exercises

  1. Decide whether the map \(T(x,y) = (x+y,\; 2y)\) is linear. Check both defining properties.

  2. Find the matrix of the map \(T(x,y) = (2x-y,\; x+y)\) by computing \(T(e_1)\) and \(T(e_2)\).

  3. Let

    \[ A = \begin{bmatrix} 1 & 1 \\ 0 & 2 \end{bmatrix}, \qquad B = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}. \]

    Compute \(AB\) and \(BA\), and explain in words why they represent different compositions.

16 Sources and Further Reading

Sources checked online on 2026-04-24:

  • MIT 18.06SC Linear Transformations and their Matrices
  • Stanford Math 51 schedule
  • Hefferon, Linear Algebra
  • Springer Numerical Algorithms survey on linear algebra perspectives in deep learning
  • NeurIPS proceedings page for Attention is All You Need
Back to top