Shortest Paths¶

Shortest paths is not one algorithm. It is a chooser family built around one question:

What kind of relaxations are valid for this graph model?

That is why this topic should not be learned as "Dijkstra plus some edge cases." The real skill is:

read the edge model
identify the strongest guarantee you have
use the lightest algorithm that is still correct

This page is written as a family / chooser page, not as one single-technique lesson.

At A Glance¶

Reach for shortest paths when:

the problem asks for minimum distance, minimum cost, minimum time, or minimum number of moves
the graph may be explicit or hidden inside state transitions
later logic depends on distances as a base layer
path reconstruction matters, not just the final numeric answer

Strong contest triggers:

"shortest path"
"minimum travel cost / minimum number of edges / minimum moves"
"cheapest route"
"best path with one or two extra constraints"
"distance to a target for many later queries"

Strong anti-cues:

the graph is unweighted and you are reaching for Dijkstra out of habit
the structure is actually a tree DP or LCA problem with path queries, not shortest paths
the weights are all nonnegative but the graph is tiny and all-pairs is the real target
the problem is really about max-flow / matching / MST rather than additive path cost

What success looks like after studying this page:

you choose between BFS, 0-1 BFS, Dijkstra, Bellman-Ford, and Floyd-Warshall from the model
you can explain why Dijkstra fails on negative edges
you know when reverse-graph distances simplify the whole problem
you know when all-pairs is actually easier than repeated single-source runs
you treat relaxation as the core invariant, not as an implementation detail

Prerequisites¶

Helpful neighboring topics:

Problem Model And Notation¶

Let:

\(G = (V, E)\) be a graph
\(w(u, v)\) be the edge weight if the edge exists
\(s\) be the source
dist[v] be the current best known distance from s to v

The universal shortest-path inequality is:

\[ dist[v] \le dist[u] + w(u, v). \]

A relaxation on edge \((u, v)\) means:

\[ \text{if } dist[u] + w(u, v) < dist[v], \text{ then update } dist[v]. \]

Every shortest-path algorithm in this family is built around the same relaxation rule. What changes is:

which vertices / edges you process next
what guarantees let you finalize an answer
what weight model makes that strategy sound

One Picture Before Relaxation Order¶

Shortest-path family chooser

Visual Reading Guide¶

What to notice:

the first split is by the strongest guarantee in the graph model, not by personal preference for one algorithm
lighter tools sit higher in the tree because a stronger structural guarantee lets you finalize distances more cheaply

Why it matters:

most shortest-path mistakes in contests come from reaching for a familiar tool before checking the edge model
this chooser is meant to make "what guarantee do I have?" feel like the first reflex

Code bridge:

each leaf corresponds to a different valid relaxation schedule: queue layers for BFS, deque ordering for 0-1 BFS, priority-queue extraction for Dijkstra, full edge passes for Bellman-Ford, and intermediate-vertex DP for Floyd-Warshall

Boundary:

this chooser covers the core family only; reverse-graph tricks, bidirectional search, A*, radix-style queues, and Johnson-style all-pairs methods live outside the first-wave route

From Brute Force To The Right Idea¶

Brute Force¶

The brute-force view is:

try all paths from s to t
take the minimum total cost

This is useless in general because the number of paths can explode exponentially.

So the right compression is:

many different paths can end at the same vertex
only the best cost to each state matters

That is the dynamic-programming heart of shortest paths.

Relaxation Is The Real Primitive¶

Suppose we already know a good path to u.

Then every outgoing edge gives a candidate path to v:

\[ dist[u] + w(u, v). \]

If that candidate is better than the current dist[v], we should keep it.

That is all relaxation means.

The entire family can be summarized like this:

BFS chooses vertices in increasing number of edges
0-1 BFS chooses them with a deque because distances differ by at most 1
Dijkstra chooses the smallest current tentative distance
Bellman-Ford relaxes all edges repeatedly until path lengths by edge-count are exhausted
Floyd-Warshall adds allowed intermediate vertices one layer at a time

The First Question Is Always The Weight Model¶

Before writing any code, ask:

are edges unweighted?
are weights only 0/1?
are all weights nonnegative?
do negative edges exist?
do I need all-pairs instead of one source?
is the graph actually a DAG, grid, or reversed graph model in disguise?

That one checklist determines most shortest-path solutions correctly.

Core Invariant And Why It Works¶

This section is organized by family member, because the proof boundary is different for each one.

1. BFS: Unit-Weight Shortest Paths¶

If every edge has weight 1 (or the graph is unweighted), BFS processes vertices layer by layer.

Invariant:

when a vertex is first popped from the queue, it is reached with the minimum number of edges from the source

Why?

Because all vertices discovered from the current layer belong to the next layer, and no later discovery can use fewer edges.

So BFS is exactly the shortest-path algorithm for unit weights.

2. `0-1 BFS`: Deque-Ordered Relaxation¶

If every edge weight is either 0 or 1, then the distances of active frontier states differ by at most 1.

That is why a deque is enough:

relax by 0 -> push to the front
relax by 1 -> push to the back

The deque stays in nondecreasing tentative distance order, which gives the same spirit as Dijkstra but cheaper.

This is not a hack. It is Dijkstra specialized to a tighter queue geometry.

3. Dijkstra: Greedy Finalization Needs Nonnegative Weights¶

Dijkstra’s invariant is the famous one:

when the smallest-distance unprocessed vertex u is extracted, dist[u] is already final

Why does this work?

Because every not-yet-processed path to some other vertex must be at least as large as the current minimum tentative distance, and nonnegative edges cannot later reduce the already-finalized value.

That last clause is the whole theorem boundary:

nonnegative edges are required

If a negative edge exists, a later path may improve a node that Dijkstra already finalized.

4. Bellman-Ford: Shortest Paths By Edge Count¶

Bellman-Ford does not try to finalize nodes greedily.

Instead, it uses this invariant:

after k full passes over the edge list, every shortest path using at most k edges has been found

So after n - 1 passes:

every shortest simple path has been found, if no reachable negative cycle exists

This is why Bellman-Ford works with negative edges:

it never assumes "smallest tentative distance means done"
it only assumes that shortest simple paths use at most n - 1 edges

The negative-cycle detection pass is then natural:

if one more pass can still improve something reachable from s, there is a reachable negative cycle

5. Floyd-Warshall: Dynamic Programming On Allowed Intermediates¶

For all-pairs shortest paths on small n, Floyd-Warshall uses:

\[ dp[k][i][j] = \text{shortest path from } i \text{ to } j \text{ using only intermediates in } \{1,\dots,k\}. \]

The transition is:

\[ dp[k][i][j] = \min(dp[k-1][i][j],\ dp[k-1][i][k] + dp[k-1][k][j]). \]

The usual in-place version compresses away the first dimension.

This is not a single-source algorithm at all. It is all-pairs DP over intermediate-vertex sets.

6. Reverse Shortest Paths Are The Same Problem¶

If you need distances to one fixed target in a directed graph:

reverse all edges
run the same single-source algorithm from the target

This works because every path from u to t in the original graph corresponds to a path from t to u in the reversed graph with the same total weight.

This modeling trick is one of the highest-value shortest-path habits in contests.

Complexity And Tradeoffs¶

Let n = |V| and m = |E|.

Typical bounds:

BFS: O(n + m)
0-1 BFS: O(n + m)
Dijkstra with binary heap: O((n + m) log n)
Bellman-Ford: O(nm)
Floyd-Warshall: O(n^3)

Practical tradeoffs:

Tool	Best when	Time	Main tradeoff
BFS	unweighted graph / unit costs	linear	only valid for unit weights
`0-1 BFS`	weights in `{0, 1}`	linear	specialized model
Dijkstra	nonnegative weights	near-linear on sparse graphs	fails on negative edges
Bellman-Ford	negative edges or negative-cycle detection	slower but general	too expensive if nonnegative structure exists
Floyd-Warshall	all-pairs on small `n`, dense graphs, clean DP transitions	cubic	impossible on large `n`

Rule of thumb:

use the simplest valid model
weighted algorithms are not automatically better
if all-pairs is truly the task and n is small, Floyd-Warshall may be cleaner than many single-source runs

Variant Chooser¶

Situation	Best first tool	Why it fits	Main danger
unweighted / every move costs `1`	BFS	layer order is exact shortest-path order	overengineering with Dijkstra
weights only `0` or `1`	`0-1 BFS`	deque maintains distance order cheaply	treating it as plain BFS
sparse graph, nonnegative weights	Dijkstra	greedy finalization is valid	stale heap entries or negative edges
negative edges, no huge constraints	Bellman-Ford	repeated relaxations remain valid	forgetting reachable negative-cycle logic
all-pairs, `n` small	Floyd-Warshall	direct DP on intermediates	using it when `n` is too large
many queries to one target	reverse graph + one shortest-path run	distances to target become distances from source in reversed graph	forgetting to reverse directed edges
DAG shortest paths	topological DP	no cycles, one pass in topological order	using heavier machinery than needed

Two quick reject rules:

negative edge present -> reject Dijkstra immediately
unit-weight graph -> reject Dijkstra unless there is some extra modeling reason

Beyond The Core Family¶

This page is deliberately centered on the core contest family:

BFS
0-1 BFS
Dijkstra
Bellman-Ford
Floyd-Warshall
DAG shortest paths

Those are the tools that solve the overwhelming majority of shortest-path tasks in standard contest sets.

There are also important advanced / specialized shortest-path tools:

A* for one-source, one-target search when you have a strong heuristic
bidirectional BFS / Dijkstra when the start and target are both fixed and the graph model is friendly
Dial's algorithm or radix-heap style Dijkstra when edge weights are nonnegative integers with extra structure
Johnson's algorithm when you want all-pairs shortest paths on sparse graphs with negative edges but no negative cycles

These tools are valuable, but they belong to the next layer for a reason:

their correctness depends on extra assumptions beyond the core weight model
their payoff is often about constants, heuristics, or preprocessing strategy
many contest problems do not provide the clean heuristic or graph structure they need

For example, A* is not "faster Dijkstra by magic." It is correct when the heuristic never overestimates the remaining distance, and it is especially clean when that heuristic is also consistent. In route planning or pathfinding on geometric maps, that often happens naturally. In many contest graphs, it does not.

So the study order should be:

master the core family on this page
learn how relaxation order changes with the weight model
only then add goal-directed or specialized speedups such as A*, bidirectional search, or integer-weight optimizations

Worked Examples¶

Example 1: `Message Route` Is Just BFS¶

In Message Route, every edge represents one hop.

So the path cost is:

\[ \text{number of edges}. \]

That is exactly the BFS model.

Why not Dijkstra?

Because Dijkstra would be solving a more general problem than the one actually present. BFS already processes vertices in shortest-hop order.

This is the cleanest reminder that:

BFS is not "graph traversal only"
it is the correct shortest-path tool when all edges cost 1

Example 2: `0-1 BFS` Is Dijkstra With A Deque¶

Suppose the graph edges are:

u -> v with cost 0
u -> w with cost 1

If dist[u] = 7, then new tentative distances are:

7 for v
8 for w

So the active frontier differs by at most one layer of cost.

That is why a deque is enough:

v goes to the front
w goes to the back

This tiny local example is the whole reason 0-1 BFS works.

Example 3: Why Dijkstra Fails On Negative Edges¶

Consider:

s -> a (2)
s -> b (5)
b -> a (-10)

At first, Dijkstra sees:

dist[a] = 2
dist[b] = 5

so it would finalize a first.

But the true best path to a is:

s -> b -> a

with total cost:

\[ 5 + (-10) = -5. \]

That later improvement is exactly what nonnegative edges are supposed to forbid. Once you see this counterexample clearly, the Dijkstra boundary becomes impossible to forget.

Example 4: Bellman-Ford As "Paths With At Most k Edges"¶

Suppose the shortest path to some node uses exactly three edges.

Then:

after one pass, shortest paths using one edge are correct
after two passes, shortest paths using up to two edges are correct
after three passes, that target becomes correct too

This is the mental model that makes Bellman-Ford feel clean instead of brute-force:

one pass = one more allowed edge in the path

Example 5: Reverse Shortest Paths¶

Suppose a problem asks:

for every city u, what is the cheapest cost to reach target city t?

In a directed graph, running one shortest-path computation from every u is often wasteful.

Instead:

reverse every edge
start one run from t

Then the answer read at dist[u] in the reversed graph equals the original distance from u to t.

This is one of the most useful graph-modeling transformations in the whole topic.

Example 6: Floyd-Warshall Is Sometimes The Cleaner Choice¶

Suppose:

n <= 500
you need all-pairs distances
maybe also path existence or negative-cycle influence between many pairs

Then repeated Dijkstra may be more code and not even asymptotically better on dense graphs.

Floyd-Warshall gives:

direct all-pairs DP
one simple recurrence
easy extension for transitive-closure-style reasoning

This is why "all-pairs" should appear in your chooser before you reflexively think about single-source runs.

Algorithm And Pseudocode¶

Repo starter templates:

bfs.cpp
zero-one-bfs.cpp
dijkstra.cpp
bellman-ford.cpp
floyd-warshall.cpp
toposort-kahn.cpp as the ordering primitive for DAG shortest paths

BFS¶

dist[*] = INF
dist[s] = 0
queue q = [s]

while q not empty:
    u = pop front
    for each neighbor v:
        if dist[v] is INF:
            dist[v] = dist[u] + 1
            parent[v] = u
            push v

`0-1 BFS`¶

dist[*] = INF
dist[s] = 0
deque dq = [s]

while dq not empty:
    u = pop front
    for each edge (u, v, w):
        if dist[u] + w < dist[v]:
            dist[v] = dist[u] + w
            parent[v] = u
            if w == 0:
                push_front(v)
            else:
                push_back(v)

Dijkstra¶

dist[*] = INF
dist[s] = 0
priority queue pq = {(0, s)}

while pq not empty:
    (du, u) = pop minimum
    if du != dist[u]:
        continue   // stale entry

    for each edge (u, v, w):
        cand = dist[u] + w
        if cand < dist[v]:
            dist[v] = cand
            parent[v] = u
            push (dist[v], v)

Bellman-Ford¶

dist[*] = INF
dist[s] = 0

repeat n - 1 times:
    changed = false
    for each edge (u, v, w):
        if dist[u] != INF and dist[u] + w < dist[v]:
            dist[v] = dist[u] + w
            parent[v] = u
            changed = true
    if not changed:
        break

one more pass:
    if some reachable edge still relaxes:
        reachable negative cycle exists

Floyd-Warshall¶

dist[i][j] = edge weight or INF
dist[i][i] = 0

for k in vertices:
    for i in vertices:
        for j in vertices:
            dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])

DAG Shortest Paths¶

topologically sort the vertices
dist[*] = INF
dist[s] = 0

for u in topological order:
    if dist[u] == INF:
        continue
    for each edge (u, v, w):
        if dist[u] + w < dist[v]:
            dist[v] = dist[u] + w
            parent[v] = u

This works because every edge goes forward in topological order, so by the time we process u, all paths that can improve dist[u] have already had their chance.

Implementation Notes¶

1. Relaxation Is The Unifying Operation¶

If you are stuck, return to the primitive:

can I improve dist[v] through u?

This viewpoint makes algorithm switching much easier than memorizing four unrelated code blocks.

2. `long long` Is The Default Distance Type¶

Path totals overflow int surprisingly often.

The safe contest habit is:

use long long
define INF as something like numeric_limits<long long>::max() / 4

so addition does not overflow immediately.

3. Heap Dijkstra Needs Stale-Entry Skipping¶

The repo template uses:

if current_dist != dist[u]: continue

That is not a cosmetic optimization. It is the standard correctness-preserving way to ignore outdated heap entries after a better path was found later.

4. Reverse Graph Is A Modeling Tool, Not A New Algorithm¶

When distances to one target matter, do not invent a custom DP first.

Ask whether reversing edges turns the task back into a standard single-source shortest-path problem.

5. Bellman-Ford And Negative Cycles Need Reachability Language¶

The only negative cycles that matter are those reachable from the source, and for answer corruption, often those from which the destination is also reachable.

So "graph has a negative cycle" is not the right contest statement. The real question is usually:

does a relevant reachable negative cycle affect the requested distances?

6. Floyd-Warshall Needs Tiny `n`, But Gives Clean Extensions¶

Floyd-Warshall is easy to dismiss because of O(n^3), but on small n it is often the cleanest route for:

all-pairs distances
transitive closure
path existence + path cost hybrids
negative-cycle influence between many pairs

7. Shortest Paths Is Often A Base Layer¶

Many advanced problems do not end with distances. They start there.

Examples:

shortest path + path counting
shortest path + one coupon / one discount
shortest path + reconstruction under constraints

The point of this family page is to help you choose the correct base layer first.

8. DAG Shortest Paths Is The Lightweight Special Case¶

If the graph is a DAG, you do not need a heap and you do not need repeated relaxation rounds.

One topological order is enough because:

every directed path respects that order
once you leave a vertex, no later edge can come back and improve it through a cycle

This is why DAG shortest paths can handle:

positive edges
zero edges
negative edges

as long as the graph is still acyclic.

Practice Archetypes¶

The most common shortest-path-flavored tasks are:

unit-cost shortest path
nonnegative weighted shortest path
negative-edge shortest path
distance-to-target preprocessing via reverse graph
shortest path as a subroutine for later DP / reconstruction

The strongest contest smell is:

additive path cost
local edge/state transitions
and a weight model that tells you which relaxation order is valid

References And Repo Anchors¶

Research sweep refreshed on 2026-04-24.

Primary / course:

Reference:

Practice:

CSES Graph Algorithms

Repo anchors:

practice ladder: Shortest paths ladder
practice note: Message Route
practice note: QOS
starter template: bfs.cpp
starter template: zero-one-bfs.cpp
starter template: dijkstra.cpp
starter template: bellman-ford.cpp
starter template: floyd-warshall.cpp
DAG ordering anchor: Course Schedule
ordering template: toposort-kahn.cpp
notebook refresher: Graph cheatsheet