Common Distributions: Binomial Poisson and Beyond
Different random phenomena follow different patterns. Coin flips follow one distribution; waiting times follow another; rare events follow a third.
This article catalogs the major probability distributions—what they model, when they appear, and how they're related.
Think of this as a field guide to randomness.
Discrete Distributions
Bernoulli Distribution
What it models: A single yes/no experiment.
Parameters: p = probability of success.
PMF: P(X = 1) = p, P(X = 0) = 1 - p
Mean: p Variance: p(1 - p)
When it appears: Coin flips, success/failure trials, binary classifications.
Binomial Distribution
What it models: Number of successes in n independent trials.
Parameters: n = number of trials, p = probability of success.
PMF: P(X = k) = C(n,k) × pᵏ × (1-p)ⁿ⁻ᵏ
Mean: np Variance: np(1 - p)
When it appears: Number of heads in n flips, defective items in a batch, successful treatments.
Key insight: Sum of n independent Bernoulli(p) variables.
Poisson Distribution
What it models: Count of rare events in a fixed interval.
Parameters: λ = average rate.
PMF: P(X = k) = e^(-λ) × λᵏ / k!
Mean: λ Variance: λ (equals the mean!)
When it appears: Emails per hour, accidents per year, typos per page, radioactive decays per second.
Key insight: Limit of binomial when n is large, p is small, and np = λ is moderate.
Geometric Distribution
What it models: Number of trials until first success.
Parameters: p = probability of success.
PMF: P(X = k) = (1-p)^(k-1) × p
Mean: 1/p Variance: (1-p)/p²
When it appears: Waiting for first heads, number of attempts until success.
Key insight: "Memoryless"—the future doesn't depend on the past.
Negative Binomial
What it models: Number of trials until r successes.
Parameters: r = number of successes needed, p = probability of success.
PMF: P(X = k) = C(k-1, r-1) × pʳ × (1-p)^(k-r)
Mean: r/p Variance: r(1-p)/p²
When it appears: Generalization of geometric; modeling overdispersed counts.
Hypergeometric
What it models: Successes when sampling without replacement.
Parameters: N = population size, K = successes in population, n = sample size.
PMF: P(X = k) = C(K,k) × C(N-K, n-k) / C(N,n)
When it appears: Quality control sampling, card games, lottery odds.
Key insight: Like binomial, but without replacement. Draws are dependent.
Continuous Distributions
Uniform Distribution
What it models: Equally likely outcomes in an interval.
Parameters: a = lower bound, b = upper bound.
PDF: f(x) = 1/(b-a) for a ≤ x ≤ b
Mean: (a + b)/2 Variance: (b - a)²/12
When it appears: Random number generators, rounding errors, maximum entropy with bounded support.
Exponential Distribution
What it models: Time until next event in a Poisson process.
Parameters: λ = rate.
PDF: f(x) = λ × e^(-λx) for x ≥ 0
Mean: 1/λ Variance: 1/λ²
When it appears: Waiting times, lifetime of components, radioactive decay.
Key insight: Memoryless—P(X > s+t | X > s) = P(X > t).
Gamma Distribution
What it models: Time until k events in a Poisson process.
Parameters: k = shape, θ = scale (or α, β in alternative parameterization).
PDF: f(x) = x^(k-1) × e^(-x/θ) / (θᵏ × Γ(k))
Mean: kθ Variance: kθ²
When it appears: Sum of exponentials, waiting times, Bayesian priors for rates.
Special case: k = 1 gives exponential.
Beta Distribution
What it models: Probabilities, proportions, percentages.
Parameters: α, β > 0 (shape parameters).
PDF: f(x) = x^(α-1) × (1-x)^(β-1) / B(α, β) for 0 ≤ x ≤ 1
Mean: α / (α + β) Variance: αβ / ((α + β)²(α + β + 1))
When it appears: Bayesian prior for probabilities, proportions, A/B testing.
Key insight: Conjugate prior for binomial/Bernoulli. Observe k successes in n trials with Beta(α, β) prior → Beta(α + k, β + n - k) posterior.
Normal Distribution
What it models: Aggregate of many small effects.
Parameters: μ = mean, σ² = variance.
PDF: f(x) = (1/√(2πσ²)) × exp(-(x-μ)²/2σ²)
Mean: μ Variance: σ²
When it appears: Heights, errors, aggregates, anywhere the CLT applies.
Log-Normal Distribution
What it models: Variable whose logarithm is normal.
If Y = e^X where X ~ Normal(μ, σ²), then Y is log-normal.
When it appears: Income, stock prices, particle sizes, growth processes.
Key insight: Multiplicative effects produce log-normal, just as additive effects produce normal.
Student's t Distribution
What it models: Ratio involving a normal variable and chi-squared.
Parameters: ν = degrees of freedom.
When it appears: t-tests, confidence intervals with unknown variance, heavy-tailed alternatives to normal.
Key insight: Heavier tails than normal. Converges to normal as ν → ∞.
Chi-Squared Distribution
What it models: Sum of squared standard normals.
Parameters: k = degrees of freedom.
PDF: Special case of Gamma(k/2, 2).
Mean: k Variance: 2k
When it appears: Goodness-of-fit tests, variance estimation, contingency tables.
F Distribution
What it models: Ratio of two chi-squared variables (normalized by their degrees of freedom).
When it appears: ANOVA, comparing variances, regression significance.
Weibull Distribution
What it models: Failure times with varying hazard rates.
Parameters: k = shape, λ = scale.
When it appears: Reliability analysis, survival analysis, extreme values.
Key insight: k = 1 gives exponential. k > 1 means increasing failure rate (wear-out). k < 1 means decreasing failure rate (infant mortality).
Relationships Between Distributions
Binomial(n, p) → Poisson(λ) when n large, p small, np = λ.
Poisson(λ) → Normal(λ, λ) when λ large.
Binomial(n, p) → Normal(np, np(1-p)) when n large (CLT).
Exponential = Gamma(1, θ)
χ²(k) = Gamma(k/2, 2)
Sum of exponentials = Gamma
Beta(1, 1) = Uniform(0, 1)
t → Normal as df → ∞
Choosing the Right Distribution
| Scenario | Distribution |
|---|---|
| Yes/no outcome | Bernoulli |
| Count of successes in n trials | Binomial |
| Rare events per interval | Poisson |
| Trials until first success | Geometric |
| Time until event (constant hazard) | Exponential |
| Sum of many factors | Normal |
| Multiplicative growth | Log-normal |
| Modeling a probability | Beta |
| Unknown variance, small sample | t |
Why So Many?
Each distribution encodes different assumptions about randomness.
Counting vs measuring: Discrete vs continuous.
Bounded vs unbounded: Beta/uniform vs normal/exponential.
Memoryless vs history-dependent: Exponential vs Weibull.
Thin tails vs fat tails: Normal vs t vs Cauchy.
The right distribution for your problem depends on the generating mechanism. Matching distribution to mechanism is a core skill in statistical modeling.
This is Part 9 of the Probability series. Next: "The Law of Large Numbers: Why Averaging Works."
Part 9 of the Probability series.
Previous: The Normal Distribution: The Bell Curve and Why It Appears Everywhere Next: The Law of Large Numbers: Why Averages Stabilize
Comments ()