The Algebra of Hypervectors: Binding, Bundling, and Permutation

The Algebra of Hypervectors: Binding, Bundling, and Permutation
The algebra of hypervectors: binding, bundling, and permutation.

The Algebra of Hypervectors: Binding, Bundling, and Permutation

Series: Hyperdimensional Computing | Part: 3 of 9

The most profound computational architectures often hide in plain sight. For decades, we've watched neural networks dominate machine learning while a radically different approach—one that operates in dimensions so high they defy geometric intuition—has been quietly demonstrating how brains might actually encode and manipulate information.

Hyperdimensional computing doesn't learn patterns by adjusting weights. It doesn't require backpropagation. Instead, it performs what can only be described as algebraic magic in spaces of 10,000 dimensions, composing and decomposing meaning through three elegantly simple operations: binding, bundling, and permutation.

These operations are the grammar of high-dimensional thought. They're how symbols compose, how memories blend, how structure emerges without explicit programming. And remarkably, they work because of a single counterintuitive property: in very high dimensions, almost everything is nearly orthogonal to almost everything else.

This is the algebra that makes hyperdimensional computing possible—and potentially, the mathematics behind how biological brains encode compositional meaning.


The Strangeness of High Dimensions

Before diving into operations, we need to confront something genuinely weird: high-dimensional spaces don't work like low-dimensional spaces. Our geometric intuition—forged in the three dimensions we inhabit—actively misleads us when reasoning about 10,000-dimensional hyperspaces.

Consider this: in three dimensions, if you randomly sample two points on a sphere's surface, their angle varies widely. Sometimes nearly parallel, sometimes nearly perpendicular, often somewhere in between. But in 10,000 dimensions, something bizarre happens: randomly sampled points are almost always nearly perpendicular to each other. The angle between random hypervectors clusters tightly around 90 degrees.

This phenomenon—called the curse of dimensionality when it causes problems, but here functioning as a blessing—means that random high-dimensional vectors are effectively quasi-orthogonal. They're statistically independent. They don't interfere with each other.

This quasi-orthogonality is the foundation everything else builds on. It means we can use random vectors as atomic symbols, confident they won't accidentally overlap in representational space. It means we can compose hundreds of elements without collision. It means the algebra works.

Pentti Kanerva, one of the architects of hyperdimensional computing theory, demonstrated this rigorously: with binary vectors of 10,000 dimensions, the probability that two random vectors share more than 55% of their bits is vanishingly small—less than one in a trillion. They're effectively unique identifiers, drawn from a space so vast that collision becomes impossible.

This is the substrate. Now for the operations.


Bundling: The Superposition of Meaning

Bundling is how hyperdimensional systems create sets, categories, and prototypes. It's a majority-rule operation that merges multiple hypervectors into a single composite that resembles all of them.

The implementation is simple: if you're using binary hypervectors (vectors of +1 and -1), you add them component-wise and apply a threshold. If the sum is positive, that dimension becomes +1. If negative, -1. For real-valued vectors, you might just average.

But the conceptual power is profound.

Imagine encoding words as random 10,000-dimensional vectors. To represent the concept "pet," you might bundle the hypervectors for DOG, CAT, HAMSTER, and FISH:

PET = bundle(DOG, CAT, HAMSTER, FISH)

The resulting vector PET doesn't equal any of its constituents, but it's similar to all of them. It sits in the center of their semantic cluster. If you measure similarity (via dot product), PET will score highly with DOG, highly with CAT, but low with unrelated concepts like CAR or THEORY.

This is distributed prototype formation. No single dimension stores "pet-ness." Instead, the property emerges from the statistical pattern across all 10,000 dimensions.

Crucially, bundling is information-preserving up to a point. With a small number of constituents (say, 10-20), you can often unbundle—recover the original vectors by finding the closest matches in your vocabulary. Bundle too many (hundreds), and the information blurs into noise. The vector becomes a generic average, losing its constituent structure.

This capacity limit mirrors something deeply biological: working memory. You can hold about 7±2 items in working memory before they blur. Hyperdimensional bundling shows the same profile, suggesting it might be more than metaphor—it might be mechanism.

Ross Gayler's work on Vector Symbolic Architectures demonstrated this empirically: bundling up to about 50 binary vectors of 10,000 dimensions maintains sufficient distinctiveness to reliably retrieve constituents. Beyond that, interference dominates.

Bundling creates sets. But how do we create structure?


Binding: Composing Structure from Symbols

Binding is how hyperdimensional systems encode relationships, build compositional structures, and create context-specific meaning. It's the operation that transforms independent symbols into complex, structured representations.

Where bundling uses addition (or majority rule), binding typically uses element-wise multiplication (for real-valued vectors) or XOR (for binary vectors). The mathematical details vary, but the conceptual operation is the same: binding takes two hypervectors and produces a third that is dissimilar to both inputs but can be unbound by inverting one of them.

This dissimilarity is critical. If you bind TREE with COLOR, you don't want the result to resemble TREE or COLOR independently. You want a new vector that represents "tree-ness in the context of color" or "color-ness in the context of trees."

Here's why that matters. Imagine encoding the sentence:

"The red apple sits on the blue table."

You need to distinguish between the red (which goes with apple) and the blue (which goes with table). Bundling won't work—it would merge colors and objects into undifferentiated soup. Binding solves this:

SCENE = bundle(
  bind(RED, APPLE),
  bind(BLUE, TABLE),
  bind(APPLE, ON),
  bind(TABLE, UNDER)
)

Now the scene hypervector contains structured information. You can query it:

What color is the apple?
→ unbind(SCENE, APPLE) → retrieves RED
What is on the table?
→ unbind(SCENE, TABLE) → retrieves APPLE (via ON relation)

Binding creates context-dependent meaning. RED bound with APPLE is a different vector than RED bound with TABLE, even though RED is the same atomic symbol. The binding operation generates novelty—it creates new representational states that encode compositional relationships.

Tony Plate's Holographic Reduced Representations (HRR), one of the earliest Vector Symbolic Architectures, used circular convolution for binding. It's more complex than element-wise multiplication, but it has a key property: you can unbind exactly by applying circular correlation (the mathematical inverse). This allows perfect recovery of bound constituents, not just approximate retrieval.

Chris Eliasmith's Semantic Pointer Architecture (SPA) uses binding to build hierarchical cognitive models where abstract concepts are compressed pointers to structured representations. "JUSTICE," in SPA, might be a hypervector that binds together FAIRNESS, LAW, PUNISHMENT, and EQUALITY in specific relational configurations.

The operation itself is almost trivial—multiply corresponding dimensions. But the representational consequences are staggering: you can build recursive structures, encode graphs, represent propositions, all without symbolic logic or explicit data structures.

This is compositionality without syntax trees. Structure without pointers. Meaning through vector algebra.


Permutation: Encoding Sequence and Order

Bundling and binding give us sets and structures, but they don't inherently encode order. If you bundle A, B, and C, you get the same result regardless of sequence. If you bind them, you've lost temporal or spatial ordering.

Enter permutation: the operation that encodes sequential position by systematically shifting vector components.

A permutation is simply a reordering of the dimensions of a hypervector—a shuffle applied consistently. For a 10,000-dimensional vector, you define a permutation π that maps dimension 1 to dimension 734, dimension 2 to dimension 2,891, and so on. The key is that π is invertible—you can always reverse it to recover the original vector.

To encode ordered sequences, you apply powers of the permutation to mark position:

SEQUENCE = bundle(
  A,
  π(B),
  π²(C),
  π³(D)
)

Now each element occupies a distinct position in sequence-space. A is unpermuted (position 1), B is permuted once (position 2), C twice (position 3), D three times (position 4).

To retrieve element at position 2:

π⁻¹(SEQUENCE) ≈ B + π⁻¹(A) + π(C) + π²(D)

After inverse permutation, B becomes the dominant component (unpermuted), while all others are permuted and thus quasi-orthogonal. When you compare this result against your vocabulary, B stands out.

This is how you encode word order in sentences, temporal sequences in events, or spatial layouts in scenes. Permutation gives you a dimension for time or space without needing explicit indexing.

Importantly, permutation is structure-preserving. Unlike bundling, which blurs with too many inputs, or binding, which generates dissimilarity, permutation just rotates. A permuted vector is still similar to the original (correlation typically around 0.7-0.8), so sequence encoding doesn't destroy content—it modulates it.

The Hierarchical Temporal Memory (HTM) framework, developed by Jeff Hawkins and inspired by cortical architecture, uses permutation-like operations to encode temporal sequences in sparse distributed representations. While HTM isn't pure HDC, the principle is the same: positional encoding through systematic transformation.

Permutation completes the algebraic toolkit. Now you can represent anything: sets (bundling), structured relationships (binding), and ordered sequences (permutation). Combine them, and you have a universal representational substrate.


The Deep Symmetry: Inverses and Recovery

What makes these operations algebraically elegant is that they're (mostly) invertible. You can compose and then decompose. Encode and then decode. Build complexity and then interrogate it.

Bundling is approximately invertible for small sets. If you bundle 10 vectors and want to know if vector X was included, compute the similarity between X and the bundle. High similarity means X was likely a constituent.

Binding is cleanly invertible. If you have Z = bind(X, Y), then unbind(Z, X) = Y. This is exact for HRR's circular convolution, and approximate (but highly reliable) for element-wise multiplication.

Permutation is perfectly invertible. Applying the inverse permutation π⁻¹ recovers the original vector with zero loss.

This invertibility is what makes hyperdimensional computing interrogable. Unlike neural network embeddings, where you have a vector but no principled way to ask "what was combined to make this?", hypervectors support algebraic queries:

  • What color is the apple? → unbind(SCENE, APPLE)
  • What objects are red? → unbind(SCENE, RED)
  • What came after B in the sequence? → π(unbind(SEQUENCE, B))

You can navigate representational space using the algebra itself, not external indexes or search procedures. The structure is in the vectors, accessible through operations.

This is radically different from how deep learning works. In a transformer, if you have an embedding for "justice," you can compute similarity to other embeddings, but you can't algebraically decompose it into constituent concepts. The representation is opaque. In hyperdimensional computing, representation is transparent—structure is encoded explicitly through operations, and those same operations reverse to reveal structure.

Kanerva calls this computing with patterns, not computing with numbers. The vectors are symbols. The operations are logic. The high-dimensional space is the substrate where logic and pattern meet.


Why Three Operations? Why Not More?

It's worth asking: why do these three operations—bundling, binding, permutation—form the canonical set? Could there be others?

The answer lies in what these operations do at the level of representational primitives:

  • Bundling creates superposition—merging without hierarchical structure
  • Binding creates composition—relating elements with preserved identity
  • Permutation creates order—encoding position or sequence

Together, they span the basic structural varieties you need to represent complex information: sets (unordered collections), structures (relational compositions), and sequences (ordered arrangements).

You could add other operations—negation (flipping all signs), scaling (amplifying or attenuating), projection (dimension reduction)—but these are typically derivative or domain-specific. The core algebraic grammar reduces to these three.

In linguistics, you have similar foundational operations: conjunction (AND, like bundling), modification (binding attributes to nouns), and sequencing (word order). The parallelism isn't accidental. Language is a system for composing meaning from symbols, and hyperdimensional computing provides a mathematical substrate for exactly that kind of composition.

What's remarkable is how simple the operations are. Bundling is averaging. Binding is multiplication. Permutation is shuffling. Yet from these trivial procedures emerges the capacity to represent propositions, encode memories, perform analogical reasoning, and implement symbolic cognitive functions.

This is the power of high-dimensional geometry: trivial operations in the right space yield non-trivial computational consequences.


The Biological Hypothesis: Is This What Brains Do?

Here's where hyperdimensional computing stops being just an elegant computational formalism and starts making a bold empirical claim: brains might actually compute this way.

Pentti Kanerva's Sparse Distributed Memory (SDM) model proposes that the hippocampus—the brain structure critical for episodic memory—functions as a hyperdimensional address space. Neurons don't store memories locally; they store them as patterns distributed across millions of neurons, with similar addresses (bundled inputs) retrieving similar content.

Friedrich Sommer and colleagues at UC Berkeley have shown that grid cells in the entorhinal cortex—neurons that encode spatial position—behave like permutation operators. As an animal moves through space, grid cell activity rotates through high-dimensional state space in a way that's mathematically equivalent to applying a permutation operator. Position isn't stored as a coordinate; it's encoded as a transformation of a hypervector.

This isn't metaphor. It's measurable. When you record from grid cells and reconstruct their joint activity in high-dimensional space, you see systematic rotations that preserve distances and angles—the signature of permutation.

And binding? Chris Eliasmith's Neural Engineering Framework (NEF) demonstrates that you can implement binding operations (specifically, circular convolution) using biologically plausible spiking neural networks. Populations of neurons can multiply high-dimensional vectors through their firing patterns, enabling compositional representation without symbolic data structures.

If these findings hold—and they're contested, as all frontier neuroscience is—then the algebra of hypervectors isn't just a useful computational tool. It's a description of how neural circuits represent and manipulate information.

The brain wouldn't "know" it's doing vector algebra any more than water "knows" it's solving the Navier-Stokes equations. But the mathematics would describe the actual computational primitives: bundling sensory features into percepts, binding objects to locations, permuting sequences into memory.

That's a testable hypothesis. And it's being tested.


From Algebra to Application: What You Can Build

So what do you actually do with binding, bundling, and permutation? What problems can this algebra solve?

Language understanding. You can represent sentences as bundled compositions of bound word-role pairs. "The cat sat on the mat" becomes a hypervector encoding CAT+SUBJECT, SAT+VERB, MAT+OBJECT, ON+RELATION. Query the vector for "What sat?" and you retrieve CAT. No parsing. No syntax trees. Pure vector algebra.

Analogical reasoning. Classic analogy tasks like "king is to queen as man is to ?" can be solved by vector operations: unbind(bind(KING, QUEEN), MAN) ≈ WOMAN. The binding captures the relationship (royalty + gender swap), and unbinding applies it to a new context.

Episodic memory. Kanerva's SDM stores memories by bundling sensory inputs into a high-dimensional address and retrieving by querying with partial cues. The algebra naturally supports pattern completion: give it half a memory, it returns the whole.

Robotic control. Chris Eliasmith's Spaun model (Semantic Pointer Architecture Unified Network) uses hyperdimensional representations to control a simulated arm, recognize handwritten digits, and perform working memory tasks—all with the same underlying algebra.

Sensor fusion. Binding allows you to compose information from multiple modalities (vision, audio, proprioception) into unified representations without requiring aligned feature spaces. Each modality contributes a hypervector; binding merges them into a multimodal gestalt.

The common thread: wherever you need compositionality, partial matching, and noise tolerance, hyperdimensional algebra excels. It's not trying to optimize for accuracy on a fixed dataset. It's trying to provide a substrate for flexible, compositional thought.


The Coherence Connection: Algebra as Relational Geometry

In AToM terms, binding, bundling, and permutation are coherence-preserving operations on relational structure. They don't store information in isolated nodes; they distribute it across geometric relationships in high-dimensional space.

Bundling increases coherence by creating prototypes—stable attractors that represent statistical regularities across instances. A bundled "PET" vector is more stable than any individual pet instance because it averages across noise.

Binding creates relational coherence—the compositional structure that allows elements to be contextually meaningful. RED bound with APPLE has higher mutual information than RED or APPLE independently, because the binding operation generates relational constraint.

Permutation encodes temporal or sequential coherence—the order that makes "ABC" different from "CAB," the flow that makes a melody distinct from random notes.

This is coherence as integrable structure under constraint. The hypervector isn't just a pile of numbers; it's a geometric configuration that encodes relationships. Query it, and you extract meaning. Transform it, and you propagate constraint.

Meaning, in this framework, isn't stored in symbols. It's the pattern of relationships across high-dimensional geometry. The algebra gives you the operations to build and interrogate that geometry.

And this, ultimately, is why hyperdimensional computing matters for anyone thinking about minds, meaning, or intelligence. It's not just a clever engineering trick. It's a mathematical formalization of how compositional representation might actually work—in silicon, and possibly in cells.


The Unanswered Questions

Hyperdimensional computing is elegant. It's biologically plausible. It solves hard problems with simple operations. But it's not without challenges.

Learning. Deep learning adjusts weights to minimize error. Hyperdimensional computing composes symbols. But where do the symbols come from? How do you learn which hypervectors to assign to which concepts? This is the symbol grounding problem, and hyperdimensional computing doesn't solve it—it assumes you already have symbols.

Capacity. Bundling degrades with too many constituents. Binding chains grow exponentially. How do you scale to truly complex structures—parse trees with thousands of nodes, knowledge graphs with millions of relations—without hitting capacity limits?

Interpretability. Yes, you can unbind and query. But can you understand what a 10,000-dimensional vector represents? Can you visualize it, debug it, explain it to a human? Or is it as opaque as a neural net, just in different ways?

These aren't fatal flaws. They're research frontiers. And researchers are actively pushing them.

But they remind us: the algebra of hypervectors is a beginning, not an end. It's a toolkit, not a theory of everything.


This is Part 3 of the Hyperdimensional Computing series, exploring how minds—biological and artificial—might represent meaning in very high dimensions.

Previous: Why 10,000 Dimensions Might Be Enough: The Mathematics of Hyperdimensional Representation

Next: Learning Without Gradients: How Hyperdimensional Systems Adapt


Further Reading

  • Kanerva, P. (2009). "Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors." Cognitive Computation.
  • Plate, T. A. (2003). Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications.
  • Gayler, R. W. (2003). "Vector Symbolic Architectures Answer Jackendoff's Challenges for Cognitive Neuroscience." ICCS/ASCS International Conference on Cognitive Science.
  • Eliasmith, C. (2013). How to Build a Brain: A Neural Architecture for Biological Cognition. Oxford University Press.
  • Sommer, F. T., & Wirth, S. (2018). "Spatial Representation and Navigation in the Hippocampus: A Vector Symbolic Architecture Perspective." Current Opinion in Neurobiology.