What Is Statistics? Making Sense of Data
In 1662, John Graunt published Natural and Political Observations Made upon the Bills of Mortality. He'd been studying death records from London parishes—cause of death, age, location. And he noticed something: patterns emerged. Plague years had signatures. Birth and death rates were predictable. Individual deaths were random, but populations revealed structure.
Graunt had discovered something profound: aggregate data reveals truths invisible at the individual level. One person's height tells you nothing. A thousand people's heights tell you about human biology, nutrition, genetic variation. Statistics is the mathematics of that transition—from messy data to confident knowledge.
And it's everywhere. Every time you see "studies show," someone used statistics to get there. Every medical trial, every economic forecast, every machine learning model—statistics is the engine underneath.
But here's the problem: most people—including most researchers—misunderstand what statistics actually does. They treat it as a rubber stamp for truth when it's actually a rigorous framework for reasoning under uncertainty. And that confusion has consequences.
The replication crisis in psychology? Misunderstood statistics. The "p-hacking" scandal? Misused statistical tests. The endless debate over masks, vaccines, climate data? Often, it's people not understanding what the numbers actually mean.
This article explains what statistics is, where it comes from, and what it's actually doing when you "run the numbers." By the end, you'll understand why statistics is both more powerful and more fragile than most people realize.
The Core Problem: Learning from Incomplete Information
Here's the fundamental challenge statistics solves:
You can't measure everything. But you need to know something true.
You want to know if a drug works. You can't test it on everyone—you test it on 500 people. But somehow, you need to conclude something about the other 8 billion humans.
You want to know if income predicts voting behavior. You can't survey everyone—you survey 2,000 people. But you need to say something about the whole country.
You want to know if this website design increases conversions. You can't test all possible visitors—you run an A/B test for two weeks. But you need to decide which design to keep.
This is the domain of statistics: extracting generalizable knowledge from limited observations.
And it's not just "doing your best" or "making an educated guess." There's a mathematical framework for it. Statistics lets you quantify how confident you should be in your conclusions, given the data you have.
Two Flavors: Descriptive vs. Inferential
Statistics divides into two broad categories, and the distinction matters.
Descriptive Statistics: Summarizing What You See
This is the easy part. You have data. You want to describe it concisely.
Mean, median, mode. Central tendency—where's the "middle" of your data?
Range, variance, standard deviation. Spread—how scattered is your data?
Correlation. Do two variables move together?
Descriptive statistics doesn't make claims beyond the data. If you measure 100 people's heights and calculate the average, that's the average of those 100 people. Period. No generalization. No inference.
It's useful—you can't think about large datasets without summarizing them. But it's limited. You're stuck describing your specific sample.
Inferential Statistics: Generalizing Beyond Your Data
This is where the power (and the danger) lives.
Inferential statistics says: "I measured these 100 people, and based on that, I'm going to make a claim about everyone."
That's a leap. A necessary leap, but a leap. And statistics gives you the tools to make that leap rigorously.
Hypothesis testing: Is this effect real, or just random noise?
Confidence intervals: What range of values is the true population parameter likely to fall within?
Regression: Can I predict Y from X? How much of Y's variance does X explain?
Inferential statistics is where things get interesting—and where people get confused. Because now you're not just describing data. You're making probabilistic claims about reality.
The Machinery: Probability Theory
Statistics rests on probability theory. To understand statistics, you need to understand the logic underneath.
Probability is the mathematics of uncertainty. It answers questions like:
- If I flip a coin 100 times, how many heads should I expect?
- How likely is it that this pattern appeared by chance?
- Given this data, what's the probability that the null hypothesis is true?
Wait—scratch that last one. That's actually not what statistics tells you, but most people think it is. We'll return to that confusion in the article on p-values.
Probability theory lets you model random processes. If you know the process (a fair coin flip, a normal distribution), you can calculate the probability of any outcome.
Statistics inverts this logic. You observe outcomes, and you infer the process that generated them.
You see 100 coin flips: 73 heads, 27 tails. Statistics lets you ask: "How likely is it that this coin is fair?" The answer involves calculating: "If the coin were fair, how probable is a 73-27 split?"
That inversion—from process to outcome in probability, from outcome to process in statistics—is the conceptual core.
Populations, Samples, and the Leap of Faith
Here's where statistics gets its power and its fragility.
Population: The entire group you care about. All humans. All possible website visitors. Every electron in the universe.
Sample: The subset you actually measure. 500 trial participants. 10,000 survey respondents. The 1,000 electrons you detected.
Statistics lets you make claims about the population based on your sample. But that only works under specific conditions:
1. The sample must be representative.
If you survey Stanford undergraduates and conclude something about "humans," you're lying. Stanford undergraduates are not a random sample of humanity—they're WEIRD (Western, Educated, Industrialized, Rich, Democratic). Your conclusions generalize to... Stanford undergraduates.
Sampling bias is everywhere. Political polls that only call landlines. Medical studies that exclude women. Psychology experiments run entirely on college students. The generalization breaks when the sample doesn't represent the population.
2. The sample must be large enough.
Small samples have high variance. Flip a coin 5 times, you might get 4 heads. That doesn't mean the coin is unfair—it means 5 flips isn't enough data.
How large is large enough? It depends on the effect size, the variance, and your desired confidence. But this is quantifiable—statistics gives you formulas for sample size calculations.
3. You must account for random variation.
Even with a perfect sample, randomness exists. If the drug works, not everyone will improve. If income predicts voting, it's not deterministic. Statistics gives you tools to separate signal (real effects) from noise (random variation).
Hypothesis testing, confidence intervals, significance tests—these are all mechanisms for asking: "Is what I'm seeing real, or just chance?"
The Replication Crisis: When the Tools Break Down
Here's the uncomfortable truth: the statistical tools most researchers use are fragile. They work beautifully when used correctly. And they break catastrophically when misused.
The replication crisis in psychology (and medicine, and economics, and...) happened because researchers systematically misunderstood what their statistical tests were telling them.
The core failure:
Researchers thought: "I got p < 0.05, so my hypothesis is probably true."
What p < 0.05 actually means: "If my hypothesis were false, this data would be surprising."
That's not the same thing. At all. And that confusion led to:
- P-hacking: Running multiple tests until you get p < 0.05, then reporting only that one.
- HARKing: Hypothesizing After Results are Known—pretending you predicted what you found.
- Publication bias: Journals only publish "significant" results, so null findings disappear.
The result: a literature full of false positives. Effects that don't replicate. Theories built on statistical noise.
And the solution isn't "throw out statistics." The solution is understand the tools you're using. Know what they measure. Know their limits. Know when you're extrapolating beyond what the data supports.
What Statistics Actually Measures
Here's the conceptual reframe:
Statistics measures how surprised you should be.
You flip a coin 100 times, get 73 heads. Statistics tells you: "If the coin were fair, you'd see a split this extreme about 0.03% of the time." That's surprising. So you conclude: probably not a fair coin.
You give 500 people a drug, 500 people a placebo. The drug group improves more. Statistics tells you: "If the drug did nothing, you'd see a difference this large about 2% of the time." That's surprising. So you conclude: probably the drug works.
But notice the phrasing: "If the null hypothesis were true, this data would be surprising."
That's all statistics tells you. It doesn't tell you the null hypothesis is false. It doesn't tell you your hypothesis is true. It tells you how surprising your data is, conditional on assumptions.
And that's powerful. But it's not a rubber stamp.
The Coherence Connection: Statistics as Pattern Detection
Here's where this connects to the broader framework of meaning and coherence.
Statistics is formalized pattern detection. It's a rigorous method for distinguishing structure from randomness.
When data has structure—when variables covary, when groups differ systematically, when trends persist—statistics detects it. And structure is coherence. It's predictability. It's low entropy relative to baseline.
The formula M = C/T (meaning equals coherence over time) applies here. Statistical significance is a claim about coherence: "This pattern is stable enough to generalize beyond this sample."
Conversely, noise is maximum entropy. Random data has no structure. Statistics tells you when you're looking at signal versus noise—when coherence exists versus when it's just stochastic variation.
And here's the kicker: statistical tools assume the noise is random. But in complex systems—human behavior, financial markets, ecological dynamics—the "noise" often isn't random. It's structured chaos. High-dimensional feedback loops. That's when statistics breaks.
You can have perfect statistical significance and still be measuring nothing real—because your model doesn't capture the actual causal structure. We'll return to this in the article on correlation versus causation.
The Core Statistical Workflow
Here's the basic pipeline:
1. Define your question.
Not "Does this drug work?" but "Does this drug reduce symptom X by at least Y amount compared to placebo?"
Vague questions yield vague answers. Statistics requires precision.
2. Choose your test.
Different questions require different tools. Comparing two groups? T-test. Comparing multiple groups? ANOVA. Predicting continuous outcomes? Regression. Testing independence? Chi-square.
3. Collect data.
Ideally through randomized controlled trials. More often through observational studies, surveys, or existing datasets. The data collection method determines what you can conclude.
4. Calculate your statistic.
This is the "running the numbers" part. You compute a test statistic, a p-value, a confidence interval—whatever your test outputs.
5. Interpret the result.
This is where people screw up. You don't "prove" anything. You don't "accept" or "reject" hypotheses in any final sense. You quantify evidence. You update your confidence.
And you acknowledge assumptions. Every statistical test assumes something—normality, independence, random sampling. Violate the assumptions, and the test's guarantees evaporate.
Where Statistics Fails
Let's be honest about the limits.
1. Garbage in, garbage out.
No statistical sophistication fixes bad data. If your sample is biased, your measurement is noisy, your variables are confounded—statistics can't save you.
2. Correlation isn't causation.
Statistics detects associations. It doesn't—can't—tell you which variable causes which. Causal inference requires additional structure (randomization, natural experiments, causal graphs).
3. Statistical significance isn't practical significance.
You can have p < 0.0001 for an effect so small it doesn't matter. "Statistically significant" just means "probably not random." It doesn't mean "important."
4. The file-drawer problem.
If 100 labs run the same test and only the 5 with p < 0.05 publish, the literature systematically overstates evidence. Statistics assumes you're reporting everything. Publication bias breaks that.
5. Black swan events.
Statistics models the expected. It struggles with the unprecedented, the nonlinear, the tail risk. The 2008 financial crisis wasn't predicted by statistical models—because models assumed the future would resemble the past.
Why It Still Matters
For all its limits, statistics is indispensable.
It disciplines thought. You can't wave your hands about "seems like" or "probably." You have to quantify your uncertainty.
It enables science. Randomized controlled trials work because statistics tells you when a difference is meaningful versus noise.
It scales cognition. Humans can't intuitively reason about thousands of data points. Statistics compresses that complexity into interpretable summaries.
And when used rigorously—when assumptions are checked, when conclusions are appropriately hedged, when replication is valued—statistics is the most powerful tool we have for learning from data.
The problem isn't statistics. The problem is treating it like magic rather than mathematics.
What's Next
This series unpacks the core tools of statistical inference:
Foundations: Descriptive statistics, sampling, populations. How do you summarize data and generalize from samples?
Inference: Hypothesis testing, confidence intervals, p-values, error types. How do you decide if an effect is real?
Modeling: Regression, correlation, ANOVA, chi-square. How do you model relationships between variables?
Synthesis: What statistics actually measures, why it works, where it breaks.
By the end, you'll understand not just how to run a statistical test, but what the test is actually telling you—and when to trust it.
Next up: Descriptive Statistics—the foundation of everything else.
Further Reading
- Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd.
- Gigerenzer, G. (2004). "Mindless statistics." Journal of Socio-Economics, 33(5), 587-606.
- Ioannidis, J. P. (2005). "Why most published research findings are false." PLoS Medicine, 2(8), e124.
- Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press.
- Wasserstein, R. L., & Lazar, N. A. (2016). "The ASA's statement on p-values: context, process, and purpose." The American Statistician, 70(2), 129-133.
This is Part 2 of the Statistics series, exploring how we extract knowledge from data. Next: "Descriptive Statistics Explained."
Part 1 of the Statistics series.
Previous: Statistics Explained Next: Descriptive Statistics: Summarizing Data
Comments ()