Common Distributions: Binomial Poisson and Beyond

The normal distribution gets all the glory, but binomial, Poisson, geometric, and exponential distributions do the real work. Each one emerges naturally from a specific random process — once you see the logic, the formulas become obvious.

Common Distributions: Binomial Poisson and Beyond | Ideasthesia

Different random phenomena follow different patterns. Coin flips follow one distribution; waiting times follow another; rare events follow a third.

This article catalogs the major probability distributions—what they model, when they appear, and how they're related.

Think of this as a field guide to randomness.

Discrete Distributions

Bernoulli Distribution

What it models: A single yes/no experiment.

Parameters: p = probability of success.

PMF: P(X = 1) = p, P(X = 0) = 1 - p

Mean: p Variance: p(1 - p)

When it appears: Coin flips, success/failure trials, binary classifications.

Binomial Distribution

What it models: Number of successes in n independent trials.

Parameters: n = number of trials, p = probability of success.

PMF: P(X = k) = C(n,k) × pᵏ × (1-p)ⁿ⁻ᵏ

Mean: np Variance: np(1 - p)

When it appears: Number of heads in n flips, defective items in a batch, successful treatments.

Key insight: Sum of n independent Bernoulli(p) variables.

Poisson Distribution

What it models: Count of rare events in a fixed interval.

Parameters: λ = average rate.

PMF: P(X = k) = e^(-λ) × λᵏ / k!

Mean: λ Variance: λ (equals the mean!)

When it appears: Emails per hour, accidents per year, typos per page, radioactive decays per second.

Key insight: Limit of binomial when n is large, p is small, and np = λ is moderate.

Geometric Distribution

What it models: Number of trials until first success.

Parameters: p = probability of success.

PMF: P(X = k) = (1-p)^(k-1) × p

Mean: 1/p Variance: (1-p)/p²

When it appears: Waiting for first heads, number of attempts until success.

Key insight: "Memoryless"—the future doesn't depend on the past.

Negative Binomial

What it models: Number of trials until r successes.

Parameters: r = number of successes needed, p = probability of success.

PMF: P(X = k) = C(k-1, r-1) × pʳ × (1-p)^(k-r)

Mean: r/p Variance: r(1-p)/p²

When it appears: Generalization of geometric; modeling overdispersed counts.

Hypergeometric

What it models: Successes when sampling without replacement.

Parameters: N = population size, K = successes in population, n = sample size.

PMF: P(X = k) = C(K,k) × C(N-K, n-k) / C(N,n)

When it appears: Quality control sampling, card games, lottery odds.

Key insight: Like binomial, but without replacement. Draws are dependent.

Continuous Distributions

Uniform Distribution

What it models: Equally likely outcomes in an interval.

Parameters: a = lower bound, b = upper bound.

PDF: f(x) = 1/(b-a) for a ≤ x ≤ b

Mean: (a + b)/2 Variance: (b - a)²/12

When it appears: Random number generators, rounding errors, maximum entropy with bounded support.

Exponential Distribution

What it models: Time until next event in a Poisson process.

Parameters: λ = rate.

PDF: f(x) = λ × e^(-λx) for x ≥ 0

Mean: 1/λ Variance: 1/λ²

When it appears: Waiting times, lifetime of components, radioactive decay.

Key insight: Memoryless—P(X > s+t | X > s) = P(X > t).

Gamma Distribution

What it models: Time until k events in a Poisson process.

Parameters: k = shape, θ = scale (or α, β in alternative parameterization).

PDF: f(x) = x^(k-1) × e^(-x/θ) / (θᵏ × Γ(k))

Mean: kθ Variance: kθ²

When it appears: Sum of exponentials, waiting times, Bayesian priors for rates.

Special case: k = 1 gives exponential.

Beta Distribution

What it models: Probabilities, proportions, percentages.

Parameters: α, β > 0 (shape parameters).

PDF: f(x) = x^(α-1) × (1-x)^(β-1) / B(α, β) for 0 ≤ x ≤ 1

Mean: α / (α + β) Variance: αβ / ((α + β)²(α + β + 1))

When it appears: Bayesian prior for probabilities, proportions, A/B testing.

Key insight: Conjugate prior for binomial/Bernoulli. Observe k successes in n trials with Beta(α, β) prior → Beta(α + k, β + n - k) posterior.

Normal Distribution

What it models: Aggregate of many small effects.

Parameters: μ = mean, σ² = variance.

PDF: f(x) = (1/√(2πσ²)) × exp(-(x-μ)²/2σ²)

Mean: μ Variance: σ²

When it appears: Heights, errors, aggregates, anywhere the CLT applies.

Log-Normal Distribution

What it models: Variable whose logarithm is normal.

If Y = e^X where X ~ Normal(μ, σ²), then Y is log-normal.

When it appears: Income, stock prices, particle sizes, growth processes.

Key insight: Multiplicative effects produce log-normal, just as additive effects produce normal.

Student's t Distribution

What it models: Ratio involving a normal variable and chi-squared.

Parameters: ν = degrees of freedom.

When it appears: t-tests, confidence intervals with unknown variance, heavy-tailed alternatives to normal.

Key insight: Heavier tails than normal. Converges to normal as ν → ∞.

Chi-Squared Distribution

What it models: Sum of squared standard normals.

Parameters: k = degrees of freedom.

PDF: Special case of Gamma(k/2, 2).

Mean: k Variance: 2k

When it appears: Goodness-of-fit tests, variance estimation, contingency tables.

F Distribution

What it models: Ratio of two chi-squared variables (normalized by their degrees of freedom).

When it appears: ANOVA, comparing variances, regression significance.

Weibull Distribution

What it models: Failure times with varying hazard rates.

Parameters: k = shape, λ = scale.

When it appears: Reliability analysis, survival analysis, extreme values.

Key insight: k = 1 gives exponential. k > 1 means increasing failure rate (wear-out). k < 1 means decreasing failure rate (infant mortality).

Relationships Between Distributions

Binomial(n, p) → Poisson(λ) when n large, p small, np = λ.

Poisson(λ) → Normal(λ, λ) when λ large.

Binomial(n, p) → Normal(np, np(1-p)) when n large (CLT).

Exponential = Gamma(1, θ)

χ²(k) = Gamma(k/2, 2)

Sum of exponentials = Gamma

Beta(1, 1) = Uniform(0, 1)

t → Normal as df → ∞

Choosing the Right Distribution

Scenario	Distribution
Yes/no outcome	Bernoulli
Count of successes in n trials	Binomial
Rare events per interval	Poisson
Trials until first success	Geometric
Time until event (constant hazard)	Exponential
Sum of many factors	Normal
Multiplicative growth	Log-normal
Modeling a probability	Beta
Unknown variance, small sample	t

Why So Many?

Each distribution encodes different assumptions about randomness.

Counting vs measuring: Discrete vs continuous.

Bounded vs unbounded: Beta/uniform vs normal/exponential.

Memoryless vs history-dependent: Exponential vs Weibull.

Thin tails vs fat tails: Normal vs t vs Cauchy.

The right distribution for your problem depends on the generating mechanism. Matching distribution to mechanism is a core skill in statistical modeling.

This is Part 9 of the Probability series. Next: "The Law of Large Numbers: Why Averaging Works."

Part 9 of the Probability series.

Previous: The Normal Distribution: The Bell Curve and Why It Appears Everywhere Next: The Law of Large Numbers: Why Averages Stabilize