Paper Lab: Attention as Weighted Vector Mixture

A guided reading page for seeing the linear-combination heart of transformer attention inside the original paper.

Modified

April 26, 2026

Keywords

paper reading, attention, vectors, embeddings

1 Why This Paper

Use this paper lab when you want your first research-facing example of a famous ML paper whose core linear-algebra move is still simple:

build one vector by weighting and summing other vectors.

The anchor paper is:

On a first pass, ignore most architecture details and track only one equation:

\[ \sum_i \alpha_i v_i. \]

That is the attention output for one query: a weighted sum of value vectors.

The main story is:

The real mathematical objects to track are:

At this pass, separate two roles:

linear map role: learned projections create queries, keys, and values
vector-mixture role: the final output is a weighted combination of value vectors

That separation helps you see which part belongs to the Vectors topic and which part belongs more naturally to Matrices and Linear Maps.

Read this page after:

The paper’s main architectural claim is not itself a linear-algebra theorem.

But the linear-algebra object inside the architecture is very clean:

The evidence in the paper is mainly experimental and architectural, not theorem-based.

A good reproduction target is tiny:

That reproduction target is small, but it teaches the main bridge idea.

Since the original paper, attention has branched into:

But the weighted-vector-mixture view still survives as a first reading lens.

Attention is All you Need - Paper bridge - the anchor paper for this lab. Checked 2026-04-24.
Deep learning, transformers and graph neural networks: a linear algebra perspective - Second pass - current survey context for seeing attention through a linear-algebra lens. Checked 2026-04-24.
Vector Mixtures in Embeddings and Attention - First pass - site page that isolates the weighted-sum core before the architecture details.