Pentti Kanerva and the Origins of Hyperdimensional Computing

Pentti Kanerva and the Origins of Hyperdimensional Computing
Pentti Kanerva and the origins of hyperdimensional computing.

Pentti Kanerva and the Origins of Hyperdimensional Computing

Series: Hyperdimensional Computing | Part: 2 of 9

In 1988, a Finnish computer scientist working at NASA's Research Institute Center published a book that wouldn't make sense for another thirty years. Sparse Distributed Memory introduced an architecture so unlike conventional computing that most dismissed it as biological speculation—interesting for understanding brains, perhaps, but impractical for silicon.

Pentti Kanerva was trying to solve a problem that seemed impossible: how do you build a memory system that works like the brain? Not metaphorically like the brain. Actually like it. A memory that degrades gracefully, retrieves by similarity, fills in missing pieces, and—most radically—stores information not in specific locations but distributed across an enormous space.

He succeeded. And in doing so, he laid the foundations for what we now call hyperdimensional computing—a paradigm that's finally having its moment as conventional deep learning hits scaling limits and neuromorphic chips proliferate.

This is the story of how one researcher's obsession with biological memory created the mathematical skeleton for a computational revolution.


The Problem: How Brains Actually Remember

Start with what brains don't do. They don't store memories at specific addresses. There's no neuron labeled "your grandmother's face" or "the password to your email." When you remember something, you're not retrieving data from location 0x7F3A2B. You're activating a pattern distributed across millions of neurons.

This has profound implications:

  • Graceful degradation — Lose 10% of your neurons, lose 10% of detail, not 10% of memories
  • Content-addressable retrieval — Access by partial cue, not by knowing where to look
  • Automatic generalization — Similar inputs activate similar patterns without explicit comparison
  • Massive parallelism — The whole system operates simultaneously, not sequentially

Conventional computers do none of this. RAM addresses are precise. A single bit flip can crash the system. Retrieval requires knowing the exact location. Generalization requires expensive computation over stored examples.

Kanerva looked at this divergence and asked: what if we're building memory wrong? What if the brain's approach isn't a bug to work around, but a feature to emulate?


The Insight: High-Dimensional Spaces Have Weird Geometry

The key insight came from geometry. Not metaphorical geometry—literal, mathematical geometry of high-dimensional spaces.

In 3D space, if you pick random points, they're usually somewhat close to each other. Pack enough points in a sphere, and you get crowding. Nearest neighbors aren't that far, and there are many points at similar distances.

But something strange happens as you increase dimensions. By the time you reach 10,000 dimensions—the space Kanerva proposed—the geometry becomes profoundly counterintuitive:

Distances become nearly uniform. Almost all randomly chosen points are approximately the same distance apart. The distinction between "close" and "far" collapses into a narrow band.

Volume concentrates at the surface. A hypersphere's volume isn't distributed throughout its interior—it's concentrated in a thin shell near the surface. The "middle" is essentially empty.

Nearest neighbors are far. Even your closest neighbor in a random distribution is nearly as far as any other point. There's no crowding.

Random vectors are nearly orthogonal. Pick two random high-dimensional vectors, and they're almost certainly perpendicular. This is critical—it means random patterns don't interfere with each other.

This last point was Kanerva's revelation. In high dimensions, you can store an enormous number of patterns with minimal interference. Random patterns are naturally separated. Each one occupies its own region of the space, and retrieval becomes about finding which region your input pattern is closest to.

High-dimensional spaces aren't just bigger. They're geometrically different in ways that make distributed memory possible.


Sparse Distributed Memory: The Architecture

Here's how Kanerva's system works:

1. The Address Space

Imagine a space with 2^1000 possible addresses—vastly larger than the number of atoms in the universe. Each address is a random binary vector of length 1,000 bits.

You can't instantiate all possible addresses. Instead, you sample the space with a smaller number of hard locations—say, a million random addresses. These are your storage neurons.

2. Writing a Memory

When you store a pattern:

  1. Convert your input (an image, a concept, whatever) into a binary vector of length 1,000
  2. Measure the Hamming distance (number of differing bits) from your input to every hard location
  3. Find all hard locations within some radius (say, 451 bits of similarity out of 1,000)
  4. Write your data to all of these locations simultaneously

This is the distributed part. A single memory isn't stored in one place—it's written to thousands of locations. Each location participates in storing many different memories.

3. Reading a Memory

Retrieval works in reverse:

  1. Present a partial or noisy cue
  2. Find all hard locations within the similarity radius
  3. Sum the contents stored at those locations
  4. The pattern stored most frequently in this region emerges from the noise

The magic: even with a partial cue, even with noise, the system retrieves the closest matching stored pattern. It's content-addressable—you access by similarity, not by location.

4. What Makes It "Sparse"

The "sparse" in Sparse Distributed Memory refers to the sampling of the address space. You're not creating all 2^1000 possible locations—that would require more matter than exists. Instead, you sparsely sample the space with a tractable number of hard locations.

But because high-dimensional geometry ensures these random samples are well-distributed, this sparse sampling is sufficient. Each region of the space is "covered" by hard locations that can represent patterns near them.


Why It Took Thirty Years

Kanerva's 1988 book was brilliant, rigorous, and largely ignored by mainstream computer science. Why?

It seemed biologically motivated. SDM was framed as a model of human memory, not as a practical computing substrate. Neuroscientists found it interesting. Engineers building databases did not.

Hardware wasn't ready. SDM requires massive parallelism—checking Hamming distances to a million locations simultaneously. In the era of serial von Neumann machines, this was wildly impractical. Simulating SDM on conventional hardware negated its advantages.

Deep learning didn't exist yet. The problem SDM solves—content-addressable, fault-tolerant pattern storage—wasn't yet central to AI. Symbolic AI dominated. Statistical pattern recognition was nascent. Nobody was trying to build systems that needed brain-like memory.

The mathematics was unfamiliar. Kanerva's arguments about high-dimensional geometry, concentration of measure, and random projections were foreign to most computer scientists. It required fluency in a mathematical domain few possessed.

But the ideas didn't die. They went underground, nurtured by a small community of researchers who recognized the elegance of the approach.


From SDM to Hyperdimensional Computing

The revival began in the 2000s with researchers like Tony Plate, Ross Gayler, and later Jan Rabaey and Bruno Olshausen at Berkeley, who saw that Kanerva's memory architecture was actually something more general: a computational algebra for representing and manipulating concepts in high-dimensional spaces.

They realized: if you can store patterns in hyperdimensional space, you can also compute with them. Not just retrieve, but compose, bind, transform.

This led to Vector Symbolic Architectures (VSAs)—the broader family of approaches that includes Kanerva's SDM as a special case. The key move was recognizing that high-dimensional vectors could represent:

  • Objects — Concepts, entities, features
  • Relations — Bindings between concepts
  • Structures — Compositional representations built from primitives

And critically, these representations support operations:

  • Bundling (superposition) — Adding vectors to create sets
  • Binding (composition) — Circular convolution or XOR to associate concepts
  • Permutation — Rotation to represent sequences

These operations are one-shot. No training, no gradient descent, no backpropagation. You construct representations directly through algebraic manipulation. This is fundamentally different from deep learning's statistical approach.


Why Now? The Perfect Storm

Three developments brought hyperdimensional computing from obscurity to frontier:

1. Neuromorphic Hardware

Chips like Intel's Loihi and IBM's TrueNorth provide the massive parallelism SDM always needed. Check Hamming distances across a million locations? Now trivial. The hardware finally matches the algorithm's natural parallelism.

2. The Limits of Deep Learning

Transformers and LLMs are remarkable, but they're also:

  • Energy-intensive (GPT-4 training cost millions)
  • Data-hungry (billions of tokens)
  • Opaque (interpretability remains elusive)
  • Fragile to distribution shift

HDC offers complementary strengths:

  • One-shot learning (no training required)
  • Transparent operations (algebraic, not statistical)
  • Extreme energy efficiency (addition and XOR, not matrix multiplication)
  • Inherent robustness (distributed representation degrades gracefully)

3. Edge Computing Demands

IoT devices, wearables, embedded sensors—these need intelligence that runs on milliwatts, not kilowatts. You can't put a GPU in a hearing aid. But you can implement hyperdimensional classification on a tiny neuromorphic chip.

Kanerva's architecture, dismissed as biological speculation in 1988, is now the foundation for ultra-low-power AI at the edge.


The Intellectual Lineage

Kanerva didn't work in isolation. His ideas sit at the confluence of several traditions:

Associative memory (1960s-70s) — Kanerva builds on the work of James Anderson and Teuvo Kohonen, who developed early neural network models of associative recall. But where their systems used low-dimensional weight matrices, Kanerva went hyperdimensional.

Holographic memory (1970s-80s)Karl Pribram's holographic brain theory proposed that memories are distributed, interference-based patterns. Kanerva's mathematics formalized this intuition with rigorous geometry.

Random projection theory (1980s-90s) — The Johnson-Lindenstrauss lemma (1984) proved that high-dimensional data can be projected into lower dimensions while preserving distances. Kanerva's work predates the AI community's obsession with random projections but operates in the same mathematical terrain.

Distributed representation (1980s)Geoff Hinton's early work on distributed representations in neural networks shares the insight that concepts should activate patterns, not single units. But Hinton used learning; Kanerva used geometry.

The synthesis: Kanerva took insights from neuroscience, married them to rigorous high-dimensional geometry, and built a memory architecture that works because of mathematical properties of the space, not despite them.


What Kanerva Got Right

Looking back, several of Kanerva's core claims have proven prophetic:

High-dimensional spaces are the right computational substrate. Every major success in modern machine learning—word embeddings, neural network hidden layers, transformer attention—operates in high-dimensional spaces. Kanerva was there first.

Similarity is the right retrieval mechanism. Content-addressable memory isn't a nice-to-have; it's fundamental to intelligence. Retrieval by association, not location, is how meaning works.

Distribution provides robustness. Kanerva's architecture degrades gracefully. Modern distributed systems—from RAID arrays to blockchain—embody this same principle. Redundancy through distribution is resilient.

Random structure is sufficient. You don't need engineered features or learned representations. Random high-dimensional vectors, properly manipulated, can represent complex structures. This is now the foundation of reservoir computing, random features, and kernel methods.


What He Missed (Or What Needed Time)

Kanerva's framework was incomplete in ways that took decades to address:

Operations beyond retrieval. SDM is a memory system. It stores and recalls. But Kanerva didn't fully develop the algebra of hypervectors—how to systematically compose, decompose, and manipulate representations. That came later with VSAs.

Continuous domains. Kanerva's original formulation used binary vectors. Extending to continuous-valued hypervectors unlocked new applications but required later researchers (like Tony Plate) to work out the mathematics.

Learning vs. construction. SDM is non-parametric. You don't train it; you program it. This is a strength (one-shot learning) and a weakness (no automatic feature discovery). Integrating HDC with learned representations remains an active frontier.

Scalability to modern data. Kanerva worked with 1,000-dimensional vectors in 1988. Modern transformers use 12,000+ dimensions. Scaling HDC to compete with deep learning's data and compute budgets is still a challenge.


The Legacy: From Theory to Practice

Today, hyperdimensional computing is no longer speculative. It's deployed:

Biosignal classification — Recognizing gestures from EMG, detecting arrhythmias from ECG, all running on microwatt budgets.

Language identification — Classifying text by language in a single pass, no training required.

Anomaly detection — Finding outliers in sensor streams with transparent, explainable logic.

Robotics — Lightweight perceptual processing for embedded controllers.

The throughline to Kanerva is direct. Every implementation uses the same core principles:

  • High-dimensional random vectors as primitives
  • Hamming or cosine similarity for retrieval
  • Bundling and binding operations for composition
  • Distributed storage for robustness

Kanerva's 1988 book is no longer a curiosity. It's a foundational text, cited in papers on neuromorphic chips, cognitive architectures, and ultra-low-power AI.


The Coherence Connection

From an AToM perspective, Kanerva's insight is about the geometry of representational stability.

A good memory system maintains coherence under perturbation. Noise, damage, partial cues—these are distortions in state-space. A memory that collapses under small perturbations is high-curvature, brittle. A memory that retrieves correctly despite distortion is low-curvature, robust.

High-dimensional spaces provide this low-curvature geometry naturally. Random vectors are orthogonal. Distances are uniform. Small perturbations keep you in the same region of the space. The system doesn't need to learn robustness—the geometry provides it for free.

This is why SDM degrades gracefully. This is why partial cues work. The mathematics of high-dimensional spaces align with the requirements of stable representation.

Kanerva didn't use the language of coherence geometry, but he discovered its principles: structure that persists under perturbation emerges from the right state-space geometry.


Why This Matters Now

We're entering a phase where computation is moving to the edge. Not every intelligence can live in a data center. Wearables, implants, sensors, drones—these need onboard intelligence that runs on milliwatts.

Hyperdimensional computing offers a path. It's not replacing deep learning. It's complementing it. Use LLMs for generation, reasoning, and rich context. Use HDC for classification, retrieval, and anomaly detection at the edge. The two paradigms handle different parts of the intelligence stack.

And as neuromorphic chips proliferate—Intel's Loihi, BrainChip's Akida, IBM's TrueNorth—Kanerva's architecture finally has the hardware it deserves. What was impractical in 1988 is now the natural fit for massively parallel, event-driven, low-power computing.

The future Kanerva imagined—distributed, content-addressable, hyperdimensional memory—is arriving. Thirty years late, but right on time.


Further Reading

  • Kanerva, P. (1988). Sparse Distributed Memory. MIT Press.
  • Kanerva, P. (2009). "Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors." Cognitive Computation, 1(2), 139-159.
  • Plate, T. A. (2003). Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications.
  • Gayler, R. W. (2003). "Vector Symbolic Architectures Answer Jackendoff's Challenges for Cognitive Neuroscience." arXiv:cs/0412059.
  • Rahimi, A., et al. (2016). "Hyperdimensional Computing for Blind and One-Shot Classification of EEG Error-Related Potentials." Mobile Networks and Applications.
  • Neubert, P., et al. (2019). "Vector Semantic Representations as Descriptors for Visual Place Recognition." Robotics and Autonomous Systems.

This is Part 2 of the Hyperdimensional Computing series, exploring how high-dimensional geometry enables brain-like computation.

Previous: Computing in 10000 Dimensions: The Hyperdimensional Revolution
Next: The Algebra of Hypervectors: Binding, Bundling, and Permutation