The Binomial Theorem: Expanding Powers
(a + b)² = a² + 2ab + b²
You learned this in algebra. Maybe you memorized it. Maybe you even understood it as FOIL: First, Outer, Inner, Last.
But did you see the pattern?
(a + b)³ = a³ + 3a²b + 3ab² + b³
The coefficients: 1, 3, 3, 1.
(a + b)⁴ = a⁴ + 4a³b + 6a²b² + 4ab³ + b⁴
The coefficients: 1, 4, 6, 4, 1.
These aren't random. They're binomial coefficients—the combinations C(n, k)—and they encode the structure of algebraic expansion, probability distributions, and even quantum mechanics.
The binomial theorem reveals that expanding powers isn't just algebra. It's combinatorics in disguise.
The Theorem
(a + b)ⁿ = Σ C(n, k) × aⁿ⁻ᵏ × bᵏ
Summed over k = 0 to n.
Translation: When you expand (a + b)ⁿ, each term is C(n, k) × aⁿ⁻ᵏ × bᵏ.
Example: (a + b)³
- k=0: C(3,0) × a³ × b⁰ = 1 × a³ = a³
- k=1: C(3,1) × a² × b¹ = 3 × a²b = 3a²b
- k=2: C(3,2) × a¹ × b² = 3 × ab² = 3ab²
- k=3: C(3,3) × a⁰ × b³ = 1 × b³ = b³
Result: a³ + 3a²b + 3ab² + b³
The coefficients (1, 3, 3, 1) are C(3,0), C(3,1), C(3,2), C(3,3).
Why This Works: The Combinatorial Interpretation
When you expand (a + b)ⁿ, you're multiplying n copies of (a + b):
(a + b) × (a + b) × (a + b) × ... × (a + b)
Each term in the expansion comes from choosing either a or b from each of the n factors.
Example: (a + b)³ = (a + b) × (a + b) × (a + b)
To get a³: choose a from all three factors. One way. C(3, 0) for the number of b's chosen.
To get a²b: choose a from two factors and b from one. How many ways? C(3, 1) = 3 (choose which factor contributes b).
To get ab²: choose a from one factor and b from two. C(3, 2) = 3 ways.
To get b³: choose b from all three. C(3, 3) = 1 way.
The binomial coefficients count how many ways to select k factors to contribute b.
This isn't just algebraic manipulation. It's counting. The binomial theorem is combinatorics dressed as algebra.
Pascal's Triangle
The binomial coefficients form Pascal's triangle:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
Row n gives the coefficients of (a + b)ⁿ.
Construction rule: Each number is the sum of the two above it. C(n, k) = C(n-1, k-1) + C(n-1, k)
Why? To choose k items from n, either:
- Include the first item and choose k-1 from the remaining n-1: C(n-1, k-1)
- Exclude the first item and choose k from the remaining n-1: C(n-1, k)
Add them: C(n, k) = C(n-1, k-1) + C(n-1, k).
This recursive structure is why Pascal's triangle works—it encodes the recurrence relation of binomial coefficients.
Properties That Blow Your Mind
Symmetry
C(n, k) = C(n, n-k)
The triangle is symmetric. The 4th entry in row 7 equals the 3rd entry from the end.
Why? Choosing k items is the same as choosing which n-k to exclude.
Row Sums
Sum of row n = 2ⁿ.
C(n, 0) + C(n, 1) + ... + C(n, n) = 2ⁿ.
Proof via binomial theorem: Set a = b = 1: (1 + 1)ⁿ = Σ C(n, k) × 1ⁿ⁻ᵏ × 1ᵏ = Σ C(n, k) = 2ⁿ.
This also counts subsets: an n-element set has 2ⁿ subsets.
Alternating Sum
C(n, 0) - C(n, 1) + C(n, 2) - C(n, 3) + ... = 0 (for n > 0).
Proof: Set a = 1, b = -1: (1 - 1)ⁿ = 0 = Σ C(n, k) × 1ⁿ⁻ᵏ × (-1)ᵏ.
The alternating sum of binomial coefficients cancels.
Hockey Stick Identity
C(r, r) + C(r+1, r) + C(r+2, r) + ... + C(n, r) = C(n+1, r+1).
Sum down a diagonal in Pascal's triangle, you get the number below and to the right.
Combinatorial proof: Counting paths in a grid. The number of ways to reach (r+1, n-r) equals the sum of ways to reach each point along the diagonal path.
Applications in Probability
The binomial theorem defines the binomial distribution—the probability of k successes in n independent trials.
Formula: P(X = k) = C(n, k) × pᵏ × (1-p)ⁿ⁻ᵏ
Where p is the probability of success on each trial.
Example: Flip a fair coin 10 times. What's the probability of exactly 6 heads?
C(10, 6) × (0.5)⁶ × (0.5)⁴ = 210 × 0.015625 × 0.0625 ≈ 0.205 (20.5%).
The binomial coefficient counts the number of ways to get 6 heads in 10 flips. The probabilities (0.5)⁶ and (0.5)⁴ account for the likelihood of each specific sequence.
Why binomial? Because you're summing over all outcomes, and the total probability is: (p + (1-p))ⁿ = 1ⁿ = 1.
The binomial expansion ensures probabilities sum to 1.
Expanding Binomials Efficiently
Computing (a + b)ⁿ term-by-term using the binomial theorem is faster than brute-force multiplication for large n.
Example: (x + 1)¹⁰⁰
You don't want to multiply (x + 1) a hundred times. Instead, use: Term k: C(100, k) × x¹⁰⁰⁻ᵏ × 1ᵏ = C(100, k) × x¹⁰⁰⁻ᵏ.
If you only need specific terms (e.g., the coefficient of x⁵⁰), compute C(100, 50) directly.
The binomial theorem transforms a multiplicative explosion into a counting problem.
Multinomial Theorem: Beyond Two Terms
What if you have more than two terms?
(a + b + c)ⁿ
The multinomial theorem generalizes:
(a + b + c)ⁿ = Σ [n! / (i! j! k!)] × aⁱ × bʲ × cᵏ
Where i + j + k = n.
The coefficients are multinomial coefficients: n! / (i! j! k!).
This counts the number of ways to arrange n objects where i are type A, j are type B, k are type C.
Example: (a + b + c)³
Expand and collect terms. Coefficient of a²bc? That's 3! / (2! 1! 0!) = 3.
The multinomial theorem appears in statistical mechanics (distributing particles among energy states) and combinatorics (partitioning sets).
Newton's Binomial Theorem: Non-Integer Exponents
Isaac Newton extended the binomial theorem to non-integer exponents.
For any real number r:
(1 + x)ʳ = Σ (r choose k) × xᵏ
Where (r choose k) = r(r-1)(r-2)...(r-k+1) / k!
This is an infinite series when r isn't a non-negative integer.
Example: (1 + x)^(1/2) = 1 + (1/2)x - (1/8)x² + (1/16)x³ - ...
This generates power series for roots, exponentials, and other functions—foundational in calculus and analysis.
Newton's generalization transformed the binomial theorem from finite combinatorics to infinite series, enabling approximations of irrational functions.
Binomial Coefficients in Number Theory
Binomial coefficients have deep number-theoretic properties.
Lucas' Theorem
For prime p: C(m, n) mod p = [C(m₀, n₀) × C(m₁, n₁) × ...] mod p
Where mᵢ and nᵢ are digits of m and n in base p.
This lets you compute binomial coefficients modulo primes efficiently—used in competitive programming and cryptography.
Kummer's Theorem
The highest power of prime p dividing C(m+n, m) equals the number of carries when adding m and n in base p.
Binomial coefficients connect to prime factorization in surprising ways.
Fibonacci in Pascal's Triangle
Sum the diagonals of Pascal's triangle:
Row 0: 1 → 1
Row 1: 1, 1 → 1 + 1 = 2
Row 2: 1, 2, 1 → 1 + 2 = 3
Row 3: 1, 3, 3, 1 → 1 + 3 + 1 = 5
Row 4: 1, 4, 6, 4, 1 → 1 + 4 + 3 = 8
Wait—1, 1, 2, 3, 5, 8—that's Fibonacci numbers!
Sum along the shallow diagonals: C(n, 0) + C(n-1, 1) + C(n-2, 2) + ... = F_{n+1}.
Pascal's triangle contains Fibonacci, triangular numbers, powers of 2, powers of 11—hidden structures everywhere.
Catalan Numbers from Binomial Coefficients
The nth Catalan number: C_n = (1/(n+1)) × C(2n, n).
Catalan numbers count balanced parentheses, binary trees, lattice paths—and they're derived from binomial coefficients.
Why? C(2n, n) counts lattice paths from (0,0) to (n,n). Dividing by (n+1) enforces the constraint that paths don't cross the diagonal y = x + 1.
This is the ballot problem: In an election where candidate A gets n votes and candidate B gets n votes, how many ways can votes be counted such that A is always ahead or tied until the end?
Answer: C_n = C(2n, n) / (n+1).
Binomial coefficients generate other combinatorial sequences when combined with constraints or transformations.
The Binomial Distribution in Machine Learning
The binomial distribution models binary outcomes: success/failure, yes/no, 1/0.
In machine learning:
- Classification accuracy: If your model has 70% accuracy, what's the probability it gets exactly 8 out of 10 predictions correct? Binomial distribution.
- A/B testing: If variant A has a 5% conversion rate and you test 100 users, what's the probability of 7 conversions? Binomial.
- Monte Carlo simulations: Sampling binary events follows binomial distributions.
The binomial theorem underpins statistical inference, hypothesis testing, and confidence intervals.
Approximations: Normal and Poisson Limits
For large n and moderate p, the binomial distribution approximates the normal distribution (bell curve).
For large n and small p (rare events), it approximates the Poisson distribution.
These approximations simplify probability calculations—instead of computing C(1000, 487) × p⁴⁸⁷ × (1-p)⁵¹³, use the normal approximation with mean np and variance np(1-p).
The binomial theorem connects discrete combinatorics to continuous probability.
Binary Representations and Subsets
Every integer from 0 to 2ⁿ - 1 corresponds to a subset of an n-element set.
Example: For {A, B, C} (n=3), integers 0-7 in binary:
- 000 → {} (empty set)
- 001 → {C}
- 010 → {B}
- 011 → {B, C}
- 100 → {A}
- 101 → {A, C}
- 110 → {A, B}
- 111 → {A, B, C}
Each bit position corresponds to an element. The binomial coefficient C(n, k) counts how many integers have exactly k bits set.
C(3, 0) = 1 (none set: 000) C(3, 1) = 3 (one set: 001, 010, 100) C(3, 2) = 3 (two set: 011, 101, 110) C(3, 3) = 1 (all set: 111)
Binary representations make subsets computable—algorithms iterate over integers to enumerate subsets.
Generating Functions and the Binomial Theorem
The binomial theorem is the simplest generating function:
(1 + x)ⁿ = Σ C(n, k) × xᵏ
The coefficient of xᵏ is C(n, k).
Why this matters: Generating functions transform combinatorial problems into algebraic manipulations. Multiply generating functions to count combined structures. Differentiate to count labeled structures. Compose to count nested structures.
The binomial theorem is the entry point to this entire framework.
The Binomial Theorem in Calculus
Derivative of (1 + x)ⁿ: d/dx [(1 + x)ⁿ] = n(1 + x)ⁿ⁻¹
Expand both sides using binomial theorem: LHS: d/dx [Σ C(n, k) xᵏ] = Σ k × C(n, k) xᵏ⁻¹ RHS: n × Σ C(n-1, k) xᵏ
Matching coefficients proves binomial identities.
The binomial theorem bridges discrete combinatorics and continuous calculus.
Combinatorial Proofs Using Pascal's Triangle
Many binomial identities have elegant combinatorial proofs.
Identity: C(n, k) = C(n-1, k-1) + C(n-1, k)
Combinatorial proof: Choose k items from n. Either the first item is chosen (then choose k-1 from the remaining n-1: C(n-1, k-1)), or it's not (choose k from the remaining n-1: C(n-1, k)). Add them.
This is more intuitive than algebraic manipulation—it's counting the same set two different ways.
Why This Matters
The binomial theorem isn't just an algebra trick. It's the link between combinatorics, probability, calculus, and number theory.
- Probability: Binomial distribution models real-world binary outcomes.
- Algorithm analysis: Counting subsets, bounding sums.
- Physics: Quantum mechanics uses binomial expansions in perturbation theory.
- Statistics: Hypothesis testing, confidence intervals.
- Computer science: Subset enumeration, dynamic programming.
Every time you expand (a + b)ⁿ, you're invoking combinations. Every time you calculate probabilities of repeated trials, you're using binomial coefficients.
Pascal's triangle is everywhere because the structure it encodes—choosing subsets, counting arrangements, distributing outcomes—is fundamental to how we model discrete possibility.
Further Reading
- Graham, Ronald L., Donald E. Knuth, and Oren Patashnik. Concrete Mathematics. Addison-Wesley, 2nd edition, 1994. (Chapter 5: Binomial Coefficients)
- Rosen, Kenneth H. Discrete Mathematics and Its Applications. McGraw-Hill, 8th edition, 2018. (Binomial theorem, Pascal's triangle)
- Feller, William. An Introduction to Probability Theory and Its Applications. Wiley, 3rd edition, 1968. (Binomial distribution)
- Stanley, Richard P. Enumerative Combinatorics, Volume 1. Cambridge University Press, 2nd edition, 2011. (Generating functions)
- Knuth, Donald E. The Art of Computer Programming, Volume 1. Addison-Wesley, 3rd edition, 1997. (Binomial coefficients in algorithms)
This is Part 5 of the Discrete Mathematics series, exploring the foundational math of computer science, algorithms, and computational thinking. Next: "Graph Theory Explained — The Mathematics of Networks."
Part 5 of the Discrete Mathematics series.
Previous: Combinations: Unordered Selections Next: Graph Theory: Networks of Nodes and Edges
Comments ()