Parallel Prefix Sum Benchmark¶
- Title:
Parallel Prefix Sum Benchmark - Judge / source:
Canonical parallel scan benchmark - Original URL: https://www.cs.cmu.edu/afs/cs/academic/class/15451-s14/www/LectureNotes/PrefixSumBlelloch.pdf
- Secondary topics:
Blelloch scan,Work / span,PRAM - Difficulty:
medium - Subtype:
Simulator-first exclusive prefix-sum benchmark through a work-efficient parallel scan tree - Status:
solved - Solution file: parallelprefixsum.cpp
Why Practice This¶
This is the cleanest first in-repo anchor for Parallel Algorithms.
The benchmark is intentionally narrow:
- one array
- one associative operator
- one exclusive scan
- one work-efficient tree pattern
- one sequential simulator for a parallel structure
So the whole lesson is exactly the lane itself:
- what
workandspanmean - why scan is a canonical parallel primitive
- how the up-sweep / down-sweep route works
Recognition Cue¶
Reach for this route when:
- the benchmark explicitly mentions
parallel prefix sumorscan - the teaching goal is
work O(n), span O(log n) - the route wants a simulator-preserving tree structure rather than a left-to-right cumulative loop
The strongest smell is:
I want one honest parallel-algorithms breadth benchmark, not a full threading stack.
That is exactly this note.
Problem-Specific Route¶
This repo's benchmark uses:
- one integer
n - one array
a[0..n-1] - one additive monoid with identity
0
The route is:
- pad the array to the next power of two
- run the up-sweep to compute subtree sums
- replace the root value with
0 - run the down-sweep to distribute prefixes
- output the first
nleaves as the exclusive scan
This is exactly the Blelloch work-efficient scan route in simulator form.
Core Idea¶
The first route is not about "prefix sums are cumulative."
It is about:
- restructuring the sum as a tree
- keeping total work linear
- keeping the longest dependency chain logarithmic
1. Up-Sweep¶
Build partial sums bottom-up.
After this phase, the root stores the sum of the whole padded array.
2. Root Reset¶
Replace the root with the identity element 0.
This is what turns the route into an exclusive scan.
3. Down-Sweep¶
Push prefix information back down:
- left child inherits the parent's prefix
- right child inherits
parent_prefix + left_subtree_sum
The leaves then hold the exclusive scan values.
Worked Example¶
For:
3 1 7 0 4 1 6 3
the benchmark prints:
0 3 4 11 11 15 16 22
That is:
- before
3->0 - before
1->3 - before
7->4 - and so on
Complexity¶
For input length n and padded size m:
- intended parallel work =
O(m) - intended parallel span =
O(log m) - sequential simulator runtime =
O(m)
The benchmark is breadth-first, not wall-clock-performance-first.
Input / Output Contract For This Repo¶
This repo's benchmark uses:
- one integer
n - one line with
nintegers
The solution prints one line with the exclusive scan array.
This is a repo-native benchmark, not one standard online judge task.
Pitfalls / Judge Notes¶
- The route is about exclusive scan, not inclusive scan.
- The operator is fixed to addition on this exact starter route.
- The benchmark is simulator-first and does not claim practical multicore speedup.
- Padding with identities is part of the clean tree model.
- This note does not stand in for GPU programming, OpenMP, or general concurrent algorithms.
Reusable Pattern¶
- Topic page: Parallel Algorithms
- Practice ladder: Parallel Algorithms ladder
- Starter template: blelloch-exclusive-scan.cpp
- Notebook refresher: Parallel Algorithms hot sheet
- Compare points:
- Prefix Sums
- Bit-Parallelism / Bitset Optimization
Solutions¶
- Code: parallelprefixsum.cpp