Representation Learning and Geometry of Embeddings

A bridge page showing how learned embeddings turn objects into vectors, why similarity geometry matters, and how representation quality is reflected in neighborhoods, directions, and linear probes.
Modified

April 26, 2026

Keywords

embeddings, representation learning, cosine similarity, geometry, vector spaces

1 Application Snapshot

Much of modern ML can be described as:

learn a vector representation in which useful similarities and directions become easy to use

That vector representation is an embedding.

Once an object has an embedding, many downstream operations reduce to geometry:

  • nearest neighbors
  • cosine similarity
  • clustering
  • linear classification
  • weighted mixtures and retrieval

2 Problem Setting

Suppose a model maps each object \(x\) to a vector

\[ e(x) \in \mathbb{R}^d. \]

The goal is not just to compress the object into \(d\) numbers. It is to place related objects in a geometry that helps downstream tasks.

For example, we might want:

  • similar words to have nearby vectors
  • related images to form local neighborhoods
  • classes to become easier to separate with a simple linear rule

So representation learning is often a search for a useful geometry, not only a search for a useful feature list.

3 Why This Math Appears

This page sits on top of several math pages already on the site:

So the recurring math objects are not special to NLP or vision. They are:

  • vector spaces
  • angles and norms
  • neighborhoods
  • projections and subspaces

4 Math Objects In Use

  • embedding map \(e(x)\)

  • cosine similarity

    \[ \cos(e_i,e_j) = \frac{e_i^\top e_j}{\|e_i\|\,\|e_j\|} \]

  • nearest-neighbor structure

  • low-dimensional directions or subspaces

  • linear probes or simple downstream classifiers

5 A Small Worked Walkthrough

Suppose three learned word embeddings are

\[ e(\text{cat}) = \begin{bmatrix} 1.0 \\ 0.9 \end{bmatrix}, \qquad e(\text{dog}) = \begin{bmatrix} 0.9 \\ 1.0 \end{bmatrix}, \qquad e(\text{car}) = \begin{bmatrix} -1.0 \\ 0.2 \end{bmatrix}. \]

The dot product between cat and dog is

\[ e(\text{cat})^\top e(\text{dog}) = 1.8, \]

while the dot product between cat and car is

\[ e(\text{cat})^\top e(\text{car}) = -0.82. \]

So even before any downstream classifier, the geometry already suggests:

  • cat and dog are near each other
  • car lies in a different direction

That is the practical meaning of embedding geometry:

  • related objects cluster
  • irrelevant objects separate
  • simple downstream rules become easier

In larger models, we rarely inspect vectors by hand. But the same geometric questions remain:

  • what forms a neighborhood?
  • which directions encode task-relevant variation?
  • are the learned representations easy to separate or retrieve from?

6 Implementation or Computation Note

In practice, embedding quality is often checked indirectly through:

  • nearest-neighbor queries
  • retrieval quality
  • clustering structure
  • performance of a simple linear probe on top of frozen embeddings

The next natural page after this one is Linear Probes and Representation Diagnostics, which turns that last bullet into a more careful evaluation workflow.

This is important because a bigger model does not automatically imply a better geometry. Embeddings can also become anisotropic, overly collapsed, or highly task-specific in ways that harm transfer.

7 Failure Modes

  • treating embeddings as mere storage vectors instead of a learned geometry
  • reading cosine similarity as semantic truth rather than model-dependent geometry
  • assuming visually clustered embeddings are automatically useful downstream
  • ignoring that representation quality depends on the task and training signal
  • forgetting that a linear probe can reveal whether useful structure is already present

8 Paper Bridge

9 Sources and Further Reading

Back to top