Number Theory Basics¶
Number theory basics is the layer where arithmetic stops being about decimal values and starts being about divisibility structure.
This topic is the entry point for:
gcd/lcm- prime factorization
- divisor counting
- sieve-style preprocessing
- the prime-exponent worldview behind many contest math problems
It is a family page, not one single trick. The goal is to know which primitive to reach for, why it works, and when a problem is really about divisibility rather than raw numeric simulation.
At A Glance¶
- Use when: the statement smells like divisibility, gcd/lcm, factors, or repeated factorization
- Core worldview: represent integers by their prime exponents whenever possible
- Main tools: Euclid, trial division, sieve of Eratosthenes, SPF sieve, linear sieve
- Typical complexity:
gcdinO(log n), one factorization by trial division inO(\sqrt{n}), SPF preprocessing in near-linear time - Main trap: solving a divisibility problem at the value level when the clean argument lives per prime
Problem Model And Notation¶
We work over positive integers unless stated otherwise.
Standard notation:
d | nmeans "ddividesn"gcd(a, b)is the greatest common divisorlcm(a, b)is the least common multiple
If
is the prime factorization of n, then the exponents e_i are the real structural information.
Two especially important arithmetic functions are:
and
This page focuses on the contest layer built on:
- divisibility facts
- Euclid's algorithm
- prime factorization
- divisor formulas
- sieve preprocessing
The most important next bridge pages after this one are:
From Brute Force To The Right Idea¶
Brute Force: Treat The Number As A Black Box¶
A beginner approach to number-theory-flavored tasks is often:
- test every possible divisor
- recompute
gcdfrom scratch everywhere - factor every number independently with trial division
- reason directly with values instead of structure
That works for tiny inputs, but it scales badly once:
- there are many queries
- the same arithmetic structure repeats
- the real answer depends on which primes divide a number, not on the number's decimal form
The First Shift: Think In Divisibility, Not Values¶
When a problem asks:
- whether something divides something else
- whether two numbers are coprime
- what the gcd/lcm is
- how many divisors a number has
the correct mental object is usually not the number itself.
It is:
- its prime support
- or the exponent of each prime
This is why the same problem often becomes much simpler after factorization.
The Second Shift: Prime Exponents Decouple The Problem¶
If
then:
and
This is the deepest "basic" idea in this topic.
Once you internalize it, many problems stop looking mysterious:
- gcd/lcm constructions
- divisibility feasibility
- closure under repeated gcd/lcm
- divisor counting
- "best divisor" or "best gcd" scans over many values
The Third Shift: Precompute Structure When Many Numbers Are Involved¶
For one moderate-size number, trial division is often enough.
For many numbers up to one bound MAX, recomputing from scratch is wasteful.
That is when sieve preprocessing becomes the right abstraction:
- list all primes up to
MAX - or store
spf[x] = smallest prime factor of x
Then factorization becomes repeated table lookups instead of repeated searching.
Core Invariants And Why They Work¶
1. Euclid's Algorithm¶
The key identity is:
Why is this true?
If a = qb + r, then a number divides both a and b exactly when it divides both b and r.
So the common divisors of (a, b) and (b, r) are the same, which means their greatest common divisor is the same.
This gives the standard recurrence:
gcd(a, b):
while b != 0:
(a, b) = (b, a mod b)
return a
This is why Euclid is so fast:
- each step strictly decreases the second argument
- the values collapse quickly enough to give logarithmic complexity
Two boundary facts matter constantly in contest code:
and
is usually treated as undefined mathematically, but many code paths avoid needing it.
2. Prime-Factorization View Of GCD And LCM¶
Suppose
For a divisor d to divide both a and b, its exponent on each prime p can be at most both \alpha_p and \beta_p.
So the largest such common divisor uses:
for every prime.
Similarly, a common multiple must contain enough of each prime to cover both inputs, so the smallest one uses:
That immediately yields the standard identity:
for positive integers a, b.
In code, the safer form for lcm is:
because dividing first reduces overflow risk.
3. Divisor Counting Formula¶
If
then every positive divisor of n has the form:
For each prime p_i, there are exactly e_i + 1 legal choices for f_i.
So:
The same decoupling gives the divisor-sum formula:
These formulas are not "special tricks". They are the direct consequence of the fact that divisors are chosen independently prime by prime.
4. Trial Division¶
For one number n, the simplest factorization idea is:
- try candidate divisors from
2upward - whenever one divides
n, peel off its full exponent
The stopping rule is:
- once
d * d > n, any remainingn > 1must itself be prime
Why?
Because if a composite number had no factor at most its square root, then both factors would be larger than the square root, and their product would already exceed the number.
That is the basic proof behind the O(\sqrt{n}) trial-division bound.
5. Sieve Of Eratosthenes¶
The Sieve of Eratosthenes finds all primes up to N by repeatedly crossing off multiples of each prime.
The key invariant is:
- when you reach an unmarked number
p, it is prime - every composite number has a smallest prime factor, so it will eventually be crossed off by that factor
The sieve is the right tool when you need:
- all primes up to one bound
- or many factorizations, if you want to bootstrap a better table from it
6. SPF Sieve¶
Instead of only marking "composite", you can store:
Then factorization becomes:
while x > 1:
p = spf[x]
count how many times p divides x
divide x by p repeatedly
The invariant is simple:
- every composite
xhas at least one prime factor - storing the smallest one gives a guaranteed valid next peel step
- repeated peeling terminates after exactly the total exponent count of the factorization
This is one of the most useful contest preprocessings in the whole repo.
Variant Chooser¶
Use Euclid When¶
- you just need
gcd - you need prefix/suffix gcd
- you need to simplify
lcm - the problem is really about divisibility over arrays, not about full factorization
Canonical smells:
- "remove one element and maximize gcd"
- "is this replacement enough to preserve divisibility?"
- "collapse many numbers by gcd"
Use Trial Division When¶
- you only factor one number or a few numbers
- the values are moderate enough that
O(\sqrt{n})is fine - preprocessing a large table would be overkill
Use Sieve Of Eratosthenes When¶
- you need all primes up to
N - or you need primality information for many values in one range
Use SPF Sieve When¶
- you will factor many values up to one bound
- factorization itself is part of the hot loop
- you need divisor/counting logic on many inputs
Use Linear Sieve When¶
- you want primes plus smallest-prime-factor-style structure in one pass
- or you want multiplicative functions in bulk
This is the natural next layer after the basic sieve/SPF viewpoint. It is worth knowing, but many contest tasks are already solved by a standard SPF table.
Move To Modular Arithmetic When¶
- the real problem is about inverses, powers, or combinations modulo
MOD - divisibility over integers is no longer the main invariant
That boundary is important:
- number theory basics = integer divisibility structure
- modular arithmetic = arithmetic inside one residue system
Worked Examples¶
Example 1: GCD And LCM Through Prime Exponents¶
Take:
and
Then:
because we take the smaller exponent of each prime, and:
because we take the larger exponent.
This is the mental model to keep even when you call std::gcd in code.
Example 2: Counting Divisors¶
Let
Any divisor has the form:
So:
ahas4choicesbhas3choiceschas2choices
Therefore:
This is exactly the logic behind Counting Divisors.
Example 3: Many Factorizations Need SPF, Not Repeated Trial Division¶
Suppose the input contains up to 2 * 10^5 numbers, each at most 10^6, and you must factor all of them.
Trial division per number repeats the same small-prime work over and over.
The better workflow is:
- build
spfonce for all values up to10^6 - factor each number by repeatedly peeling
spf[x]
That is the structural reason SPF pays off:
- one global preprocessing
- many cheap local factorizations
Example 4: The Best GCD May Be Found By Scanning Multiples¶
Some gcd problems are not solved by calling gcd many times.
Common Divisors is a good example:
- instead of checking every pair
- scan each candidate divisor
d - count how many numbers are multiples of
d
This is still a number-theory-basics move, because it comes from divisibility structure:
- "
dis a feasible gcd" means "at least two numbers are divisible byd"
Algorithms And Pseudocode¶
Euclid¶
gcd(a, b):
while b != 0:
r = a mod b
a = b
b = r
return abs(a)
Trial Division¶
factorize(n):
factors = []
d = 2
while d * d <= n:
if n % d == 0:
cnt = 0
while n % d == 0:
n /= d
cnt += 1
factors.push((d, cnt))
d += 1
if n > 1:
factors.push((n, 1))
return factors
SPF Build¶
build_spf(MAX):
spf[0..MAX] = 0
for i in 2..MAX:
if spf[i] == 0:
spf[i] = i
if i * i <= MAX:
for j in i * i, i * i + i, ...:
if spf[j] == 0:
spf[j] = i
Factorization Using SPF¶
factorize_with_spf(x, spf):
factors = []
while x > 1:
p = spf[x]
cnt = 0
while x % p == 0:
x /= p
cnt += 1
factors.push((p, cnt))
return factors
The repo starter for this layer is:
Implementation Notes¶
1. Compute lcm In The Safe Order¶
Prefer:
a / gcd(a, b) * b
over:
a * b / gcd(a, b)
The first version still can overflow for huge values, but it is strictly safer.
2. Peel Full Exponents At Once¶
When factoring, do not just record that p divides n.
Peel the whole exponent:
- cleaner reasoning
- direct compatibility with divisor formulas
- easier bridge to gcd/lcm constructions
3. Trial Division Needs The Leftover Prime Case¶
After the loop, if n > 1, that leftover is prime and must be emitted.
Forgetting this is one of the most common factorization bugs.
4. SPF Only Helps Inside Its Precomputed Bound¶
If you built spf up to MAX, then:
spf[x]is valid only forx <= MAX
This sounds obvious, but it is a frequent source of contest-time mistakes when inputs and derived values are mixed carelessly.
5. gcd(0, x) = x Makes Prefix/Suffix Patterns Clean¶
This is why array-gcd code can use boundary zeros so naturally.
You see this directly in:
6. Linear Sieve Is A Next-Step Optimizer, Not A Mandatory Default¶
If the task only needs:
- primes up to
N - or SPF-style factorization up to
N
a standard sieve/SPF solution is often already enough.
Use linear sieve when its extra structure actually pays for itself.
Practice Archetypes¶
The most common number-theory-basics-shaped tasks are:
- direct gcd/lcm identity problems
- factor one number and derive a formula
- factor many numbers under one max bound
- count divisors / sum divisors
- scan divisors or multiples instead of scanning pairs
- reduce a closure or feasibility statement to prime exponents
Strong contest smells include:
- the statement keeps saying
divides,coprime,prime,factor,divisor - the brute force is over pairs or all divisors
- the clean solution seems to separate by prime
- repeated factorization suggests one preprocessing table
References And Repo Anchors¶
Research sweep refreshed on 2026-04-24.
Primary / course:
Course / notes:
Reference:
- CP-Algorithms: Euclidean Algorithm
- CP-Algorithms: Sieve of Eratosthenes
- CP-Algorithms: Linear Sieve
- USACO Guide: Divisibility
Practice:
Repo anchors:
- starter template: number-theory-basics.cpp
- extended-Euclid bridge: extended-gcd-diophantine.cpp
- notebook refresher: Number Theory Cheatsheet
- adjacent topic: Modular Arithmetic