What Is Probability? Quantifying Uncertainty
In 1654, a French gambler named Antoine Gombaud—the Chevalier de Méré—posed a question to Blaise Pascal: How should two players divide the stakes of an interrupted dice game?
Pascal wrote to Pierre de Fermat. Their correspondence invented probability theory.
It started with gambling. It became the mathematics that underlies quantum mechanics, machine learning, statistical inference, and decision theory. What Pascal and Fermat discovered wasn't just how to split winnings—it was how to reason rigorously about uncertainty itself.
The Problem of Uncertainty
We don't know the future. We don't know the full present. Much of the past is guesswork.
And yet we must decide and act. We take umbrellas based on weather forecasts. We approve drugs based on clinical trials. We invest money based on expected returns. We believe theories based on evidence.
How should we reason when we don't have certainty?
Before probability, there were two answers: certainty or ignorance. Either you knew something or you didn't. Mathematics dealt with what was provable. Uncertainty was for gamblers and fools.
Probability changed this. It provided a calculus of uncertainty—rules for combining, updating, and reasoning about incomplete information. It made uncertainty mathematically tractable.
What Probability Actually Measures
The probability of an event is a number between 0 and 1:
- 0 means impossible
- 1 means certain
- Numbers in between measure degrees of possibility
But what does this number represent? There are two major interpretations.
Frequentist: Probability is long-run frequency. The probability of heads is 0.5 because, if you flip a coin many times, about half will be heads.
This works for repeatable events. Flip a coin a million times—you'll get close to 50% heads. The long-run frequency converges to the probability.
But what about one-time events? What's the probability that a specific defendant is guilty? What's the probability it rains tomorrow? You can't replay tomorrow a million times and count.
Bayesian: Probability is degree of belief. The probability represents how confident you should be, given your information.
In this view, probability isn't about counting outcomes in repeated trials—it's about rational uncertainty. You can assign probability to anything you're uncertain about, including one-time events.
Both views are useful. Frequentist thinking dominates classical statistics; Bayesian thinking dominates machine learning and decision theory. The mathematics is the same either way.
The Kolmogorov Axioms
In 1933, Andrei Kolmogorov put probability on rigorous foundations. His axioms are simple:
Axiom 1: The probability of any event A is non-negative: P(A) ≥ 0
Axiom 2: The probability of something happening is 1: P(Ω) = 1, where Ω is the sample space (all possible outcomes)
Axiom 3: For mutually exclusive events (can't happen together), probabilities add: P(A or B) = P(A) + P(B)
That's it. Everything else in probability theory follows from these three axioms.
From these simple rules, we derive all the machinery: conditional probability, independence, Bayes' theorem, random variables, distributions, limit theorems. The entire edifice of probability—and by extension, statistics—rests on three axioms.
Sample Spaces and Events
Sample space (Ω): The set of all possible outcomes.
For a coin flip: Ω = {H, T} For a die roll: Ω = {1, 2, 3, 4, 5, 6} For two dice: Ω = {(1,1), (1,2), ..., (6,6)}—36 outcomes
Event: A subset of the sample space. Something that either happens or doesn't.
"Roll an even number" is the event {2, 4, 6}. "Sum of two dice is 7" is {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}.
Probability function: Assigns a number to each event, following the axioms.
For a fair die, each outcome has probability 1/6. The event "even" has probability 3/6 = 1/2 (three favorable outcomes out of six).
The Counting Principle
For finite sample spaces where all outcomes are equally likely, probability becomes counting:
P(A) = (number of outcomes in A) / (total number of outcomes)
This is "favorable over total"—the formula you learned in school.
Example: Probability of rolling at least one 6 on two dice?
Total outcomes: 36 Outcomes with no 6s: 5 × 5 = 25 (each die has 5 non-6 options) Outcomes with at least one 6: 36 - 25 = 11 Probability: 11/36 ≈ 0.306
Why Probability Works
Probability theory works because uncertainty, paradoxically, is regular.
Individual coin flips are unpredictable. But flip a million coins and you'll get very close to 500,000 heads. The individual randomness averages out. The aggregate behavior is predictable.
This is the Law of Large Numbers—random variables converge to their expectations when you average many of them. It's why casinos make money, insurance works, and polls approximate population opinions.
There's something philosophically remarkable here. From unpredictability emerges regularity. From chaos emerges pattern. Probability is the mathematics that describes this emergence.
The Basic Rules
From Kolmogorov's axioms, we derive working rules:
Complement Rule: P(not A) = 1 - P(A)
If the probability of rain is 0.3, the probability of no rain is 0.7.
Addition Rule (general): P(A or B) = P(A) + P(B) - P(A and B)
You subtract the intersection because it's counted twice otherwise.
Multiplication Rule (independent events): P(A and B) = P(A) × P(B)
For independent events—ones that don't affect each other—you multiply. The probability of two heads in two flips: 0.5 × 0.5 = 0.25.
Conditional Probability: P(A | B) = P(A and B) / P(B)
The probability of A given that B has occurred. This is where it gets interesting.
Why Intuition Fails
Human brains are notoriously bad at probability. Evolution optimized us for pattern-matching in small tribes, not calculating odds in large populations.
The Birthday Problem: How many people do you need in a room before there's a 50% chance two share a birthday?
Intuition says something large—maybe 183 (half of 365)? The answer is 23. With 50 people, the probability exceeds 97%.
Our intuition vastly underestimates this because we think about specific pairs, not the combinatorial explosion of possible pairs.
Gambler's Fallacy: After five heads in a row, surely tails is "due"?
No. Each flip is independent. The coin has no memory. The probability of heads on flip six is still 0.5, regardless of what came before.
Base Rate Neglect: A test is 99% accurate. You test positive. What's the probability you have the disease?
Without knowing how rare the disease is, you can't say. If the disease affects 1 in 10,000 people, most positive tests are false positives. Bayes' theorem handles this; intuition doesn't.
Probability as Logic
E.T. Jaynes argued that probability theory is the uniquely consistent extension of logic to uncertain reasoning.
Logic gives you tools for reasoning with certainty: If A implies B, and A is true, then B is true.
Probability extends this: If A makes B more likely, and A has probability 0.7, then B has higher probability than it would otherwise.
The rules of probability—the axioms and everything derived from them—are the only consistent rules for reasoning with degrees of belief. Any other system leads to contradictions or irrational behavior (like Dutch book arguments show: inconsistent beliefs let you be exploited by clever bets).
This is a strong claim: probability isn't just useful, it's uniquely correct for handling uncertainty.
What's Coming
This series builds from these foundations:
Conditional probability and Bayes' theorem: How to update beliefs with evidence. The theorem that revolutionized machine learning, medical diagnosis, and scientific inference.
Random variables: The bridge from events to numbers. How to describe uncertain quantities mathematically.
Expectation and variance: What to expect on average, and how much spread to expect around that average.
Distributions: The patterns uncertainty takes. Binomial, normal, exponential, Poisson—each appears in specific contexts.
Limit theorems: The Law of Large Numbers and Central Limit Theorem. The deep results that explain why sampling works and why the bell curve is everywhere.
By the end, you'll understand not just how to calculate probabilities but why probability works—what makes this mathematics so powerful for reasoning about an uncertain world.
The Pebble
Here's the core insight: uncertainty isn't the absence of knowledge. It's a kind of knowledge.
When you don't know whether a die will land on 6, you still know something: it has probability 1/6. That knowledge lets you make better decisions than if you truly knew nothing.
Probability is the mathematics that extracts this structure from uncertainty. It turns "I don't know" into "I know it follows these rules." And following those rules leads to conclusions you can bet your life on—literally, for decisions about medicine, safety, and risk.
Pascal and Fermat started with a gambling question. They discovered the mathematics of rational uncertainty. Every statistical claim, every machine learning algorithm, every actuarial table traces back to that 1654 correspondence.
Probability is how we know things we can't be sure of.
This is Part 1 of the Probability series. Next: "Probability Rules: The Logic of Chance."
Part 1 of the Probability series.
Previous: Probability Explained Next: Basic Probability Rules: And Or and Not
Comments ()