Long-Range Dependence and Oversquashing in Graphs
oversquashing, long-range dependence, graph bottleneck, GNN, rewiring
1 Application Snapshot
Some graph tasks depend on information that starts far away from the node we want to predict.
That is a long-range dependence problem.
Message-passing GNNs often struggle there, not only because they need enough layers, but because the information from many distant nodes may have to cross a very small graph bottleneck.
That compression phenomenon is called oversquashing.
The key idea is:
too many distant signals are squeezed into too few message channels before they reach the target
2 Problem Setting
A generic message-passing update has the form
\[ h_v^{(\ell+1)} = \phi\!\left( h_v^{(\ell)}, \operatorname{AGG}\{h_u^{(\ell)} : u \in \mathcal{N}(v)\} \right), \]
where node \(v\) receives aggregated information from its neighbors.
If the task at node \(v\) depends on far-away nodes, then we need many rounds of propagation. But each round still passes through local neighborhoods and finite-width hidden vectors.
So two questions appear together:
Can information travel far enough?Can it travel without being crushed on the way?
Oversquashing is about the second question.
3 Why This Math Appears
This page continues the site’s graph-ML sequence:
- Graph Diffusion and Message Passing: message passing is repeated local propagation
- Oversmoothing, Depth, and Graph Sampling: more layers increase receptive field, but repeated propagation can wash node states together
- Graph Rewiring, Homophily, and Heterophily: changing the graph operator can alter which information paths exist at all
Oversquashing is easiest to understand from graph topology:
- the number of relevant distant nodes can grow very quickly
- the number of edges or hidden channels carrying their information to the target may stay small
That mismatch is why bottlenecks matter.
4 Math Objects In Use
- graph \(G=(V,E)\)
- target node \(v\)
- receptive field after \(L\) message-passing layers
- bottleneck edge set or narrow cut
- hidden state width \(d\)
- local aggregation operator
5 A Small Worked Walkthrough
Imagine a graph made from two parts:
- on the left, a binary tree of depth \(3\)
- on the right, one query node \(q\)
- between them, a single bridge edge
So the graph looks schematically like:
many left-side nodes -> one bridge -> query node
Suppose the label of \(q\) depends on a summary of the leaf features on the left side.
At depth \(3\), there are already \(2^3 = 8\) leaves. If we go deeper, the number of potentially relevant leaves grows exponentially, but the bridge remains only one edge wide.
After enough message-passing layers, all of that left-side information must influence \(q\) through a fixed-width hidden state, say
\[ h_b^{(\ell)} \in \mathbb{R}^d, \]
at the bridge node \(b\).
So the model is forced to compress many distant signals into a small vector before the query ever sees them.
That is oversquashing.
The important contrast is:
oversmoothing: repeated propagation makes nearby node states too similaroversquashing: too many distant signals are compressed through too small a cut
Both can happen in deep GNNs, but they are not the same failure mode.
In fact, increasing depth can help long-range reach while making oversquashing worse, because even more distant information is now trying to pass through the same bottleneck.
6 Implementation or Computation Note
Typical responses to oversquashing include:
Graph rewiringAdd shortcut edges or modify propagation paths so the task-relevant information does not have to pass through a narrow cut.Positional or structural encodingsGive the model extra information about global graph position or distance, so it relies less on repeated local transport alone.Wider hidden states or richer message channelsThis can help somewhat, but topology often dominates.Nonlocal mechanismsGlobal attention, long-range edges, or graph-transformer style layers can bypass the pure local-bottleneck regime.Task-aware graph designIf the observed graph is not the best carrier of task information, auxiliary edges or learned propagation channels may be necessary.
This is also why oversquashing is often discussed together with rewiring. Rewiring is not just a preprocessing trick; it changes the geometry of information flow.
7 Failure Modes
- confusing oversquashing with oversmoothing
- assuming more GNN layers automatically solve long-range dependencies
- treating all difficult graph tasks as homophily problems
- believing width alone will remove a topological bottleneck
- reading a rewiring result as universally good even when the added shortcuts destroy task-relevant structure elsewhere
- ignoring that dataset construction can create or hide bottlenecks before model design even begins
8 Paper Bridge
- On the Bottleneck of Graph Neural Networks and its Practical Implications -
First pass- the classic source that framed oversquashing as a bottleneck problem for long-range graph dependencies. Checked2026-04-24. - Understanding Oversquashing and Bottlenecks on Graphs via Curvature -
Paper bridge- influential paper connecting bottlenecks, geometry, and rewiring strategies. Checked2026-04-24.
9 Sources and Further Reading
- CS224W: Machine Learning with Graphs -
First pass- official Stanford course hub for modern graph ML and GNN foundations. Checked2026-04-24. - On the Bottleneck of Graph Neural Networks and its Practical Implications -
First pass- primary source introducing oversquashing through graph bottlenecks. Checked2026-04-24. - Understanding Oversquashing and Bottlenecks on Graphs via Curvature -
Second pass- primary source for the curvature-and-bottleneck perspective and rewiring motivation. Checked2026-04-24. - Oversquashing in GNNs through the lens of information contraction and graph expansion -
Second pass- primary source giving another useful angle through information contraction and expansion-based rewiring. Checked2026-04-24. - Graph Rewiring, Homophily, and Heterophily -
First pass- site page on how changing the graph operator can relieve harmful propagation structure.