Matrices and Linear Maps
matrices, linear maps, linear transformations, matrix multiplication, composition, standard basis
1 Role
This page is where linear algebra shifts from what vectors can be built to what rules transform vectors.
It explains why matrices are not just rectangular arrays of numbers. A matrix is the coordinate description of a linear map, and matrix multiplication is the algebra of composing those maps.
2 First-Pass Promise
Read this page after Vectors and Linear Combinations.
If you stop here, you should still understand:
- what a linear map is
- why every matrix gives a linear map
- how the columns of a matrix record what happens to basis vectors
- why multiplying matrices means composing linear transformations
3 Why It Matters
This topic matters because many later ideas are really statements about operators, not just arrays:
- least squares studies the map \(x \mapsto Ax\) and asks which outputs are reachable or closest
- eigenvalues describe directions that a linear map stretches without turning
- SVD analyzes a linear map into orthogonal input and output directions
- numerical methods care about how a map amplifies error
- machine learning layers repeatedly apply learned matrix maps to features and embeddings
So this page gives the first real operator viewpoint of the module.
4 Prerequisite Recall
- a vector can be written in coordinates relative to a basis
- if \(A\) has columns \(a_1,\dots,a_n\), then \(Ax = x_1 a_1 + \cdots + x_n a_n\)
- linear combinations preserve scaling and addition
5 Intuition
The easiest way to misuse matrices is to treat them as static data tables.
The better viewpoint is dynamic: a matrix tells you how an input vector is transformed into an output vector.
Some maps stretch space. Some shear it. Some rotate it. Some project onto a line or plane. Some send features into a new representation space. The matrix is how we compute that action once coordinates have been chosen.
This is why the columns matter so much. If you know what a linear map does to the basis vectors, then linearity tells you what it does to every vector built from them.
That is the main idea of the page:
a matrix is determined by where it sends the basis vectors
and
matrix multiplication is what happens when you do one linear map after another.
6 Formal Core
Definition 1 (Definition) Let \(T : \mathbb{R}^n \to \mathbb{R}^m\).
We say \(T\) is a linear map if for all vectors \(u,v \in \mathbb{R}^n\) and all scalars \(c \in \mathbb{R}\),
\[ T(u+v) = T(u) + T(v) \qquad \text{and} \qquad T(cu) = cT(u). \]
So a linear map preserves vector addition and scalar multiplication.
Proposition 1 (Key Statement) If \(A\) is an \(m \times n\) matrix, then the rule
\[ T(x) = Ax \]
defines a linear map from \(\mathbb{R}^n\) to \(\mathbb{R}^m\).
Conversely, once a basis \(B\) is chosen for the domain and a basis \(C\) is chosen for the codomain, every linear map \(T : \mathbb{R}^n \to \mathbb{R}^m\) can be represented by a matrix \(A\) such that
\[ [T(x)]_C = A[x]_B. \]
In the standard bases, this becomes the familiar formula \(T(x)=Ax\).
Proposition 2 (Columns Record Basis Images) Let \(e_1,\dots,e_n\) be the standard basis vectors in \(\mathbb{R}^n\).
If \(T(x) = Ax\), then the \(j\)th column of \(A\) is exactly \(T(e_j)\).
So
\[ A = \begin{bmatrix} \vert & & \vert \\ T(e_1) & \cdots & T(e_n) \\ \vert & & \vert \end{bmatrix}. \]
This is why knowing the image of the basis vectors determines the whole map.
Proposition 3 (Composition Becomes Matrix Multiplication) If \(S(x) = Bx\) and \(T(y) = Ay\), then applying \(S\) first and then \(T\) gives
\[ (T \circ S)(x) = A(Bx) = (AB)x. \]
So matrix multiplication is the coordinate form of composition of linear maps.
7 Worked Example
Define a map \(T : \mathbb{R}^2 \to \mathbb{R}^2\) by
\[ T(x,y) = (x + 2y,\; y). \]
This is a shear: it keeps the second coordinate fixed and tilts the first coordinate by adding twice the second.
First check linearity:
\[ T\big((x_1,y_1) + (x_2,y_2)\big) = T(x_1+x_2,\; y_1+y_2) = (x_1+x_2 + 2(y_1+y_2),\; y_1+y_2), \]
which is the same as
\[ T(x_1,y_1) + T(x_2,y_2). \]
Scaling works the same way, so \(T\) is linear.
Now look at the standard basis vectors:
\[ e_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \qquad e_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}. \]
Then
\[ T(e_1) = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \qquad T(e_2) = \begin{bmatrix} 2 \\ 1 \end{bmatrix}. \]
So the matrix of \(T\) in the standard basis is
\[ A = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}. \]
Now take the vector
\[ x = \begin{bmatrix} 3 \\ -1 \end{bmatrix}. \]
Because \(x = 3e_1 - e_2\), linearity says
\[ T(x) = 3T(e_1) - T(e_2) = 3 \begin{bmatrix} 1 \\ 0 \end{bmatrix} - \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}. \]
And matrix multiplication gives the same answer:
\[ Ax = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 3 \\ -1 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}. \]
This example teaches the main story in one place:
- the map is defined geometrically
- the columns come from what happens to basis vectors
- the matrix computes the same transformation in coordinates
8 Computation Lens
Once a basis is fixed, computation with linear maps becomes matrix computation.
That is why so many later tasks reduce to matrix algebra:
- applying a linear operator means multiplying by a matrix
- composing operators means multiplying matrices
- solving for an input means solving a system involving the matrix
- asking whether a map is invertible becomes a question about the matrix
This is also why the order in matrix multiplication matters.
If you apply \(S\) and then \(T\), the combined map is \(T \circ S\), which becomes \(AB\), not \(BA\). The order of matrices follows the order of function composition from right to left.
9 Application Lens
A learned dense layer in machine learning applies a matrix map to a feature vector.
If the layer uses weights \(W\), then the linear core is
\[ x \mapsto Wx. \]
In practice many models add a bias term and then a nonlinearity, but the matrix part is still the operator that mixes coordinates, projects to a new feature space, or changes representation dimension.
So when you study matrices as linear maps, you are learning the mathematical core behind feature mixing, embeddings, and many layers used throughout ML systems.
10 Stop Here For First Pass
If you can now explain:
- what makes a map linear
- why the columns of a matrix are the images of basis vectors
- why \(T(x)=Ax\) is more than notation
- why matrix multiplication means composition
then this page has done its main job.
11 Go Deeper
If you want more after the main page:
Proof: Basis Images Determine a Linear MapApplication: Learned Linear Projections in TransformersVisual intuition: Computation Lab: Matrix Composition and Basis ActionPractice: Exercises: Matrices and Linear Maps
12 Optional Paper Bridge
13 Optional After First Pass
If you want more practice before moving on:
- compute the matrix of a map by sending each basis vector through it
- check whether a given rule really preserves addition and scaling
- compose two simple transformations and compare the result with matrix multiplication
- continue to Subspaces, Basis, and Dimension once the operator viewpoint feels stable
14 Common Mistakes
- treating a matrix as only a table of numbers rather than a rule acting on vectors
- forgetting that linear maps must preserve both addition and scalar multiplication
- thinking the matrix columns are arbitrary instead of images of basis vectors
- reading \(AB\) left to right as “do \(A\) first, then \(B\)”
- confusing linear maps with affine maps that include translations or bias terms
15 Exercises
Decide whether the map \(T(x,y) = (x+y,\; 2y)\) is linear. Check both defining properties.
Find the matrix of the map \(T(x,y) = (2x-y,\; x+y)\) by computing \(T(e_1)\) and \(T(e_2)\).
Let
\[ A = \begin{bmatrix} 1 & 1 \\ 0 & 2 \end{bmatrix}, \qquad B = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}. \]
Compute \(AB\) and \(BA\), and explain in words why they represent different compositions.
16 Sources and Further Reading
- MIT 18.06SC: Linear Transformations and their Matrices -
First pass- official explanation of the operator viewpoint that motivated this page. Checked2026-04-24. - Stanford Math 51 schedule -
First pass- current course sequence showing linear transformations and matrix multiplication as part of the core linear algebra spine. Checked2026-04-24. - Hefferon, Linear Algebra -
Second pass- strong self-study text with explicit treatment of linear maps and matrices plus many exercises. Checked2026-04-24. - Deep learning, transformers and graph neural networks: a linear algebra perspective -
Second pass- modern bridge from basic operator language to current AI systems. Checked2026-04-24. - Attention is All You Need -
Paper bridge- later reference for spotting learned linear maps inside a major architecture. Checked2026-04-24.
Sources checked online on 2026-04-24:
- MIT 18.06SC Linear Transformations and their Matrices
- Stanford Math 51 schedule
- Hefferon, Linear Algebra
- Springer Numerical Algorithms survey on linear algebra perspectives in deep learning
- NeurIPS proceedings page for Attention is All You Need