Sheaves and Contextuality: How Category Theory Models Context-Dependent Meaning

Sheaves and Contextuality: How Category Theory Models Context-Dependent Meaning
Sheaves: how local contexts assemble into coherent global meaning.

Sheaves and Contextuality: How Category Theory Models Context-Dependent Meaning

Series: Applied Category Theory | Part: 7 of 10

The word "bank" has no fixed meaning. Whether it refers to a financial institution or a riverbank depends entirely on context. Your brain handles this effortlessly—seamlessly integrating surrounding words, conversational history, and situational cues to disambiguate meaning. But how does meaning actually work this way? How can something be inherently contextual while remaining mathematically precise?

The answer lies in sheaf theory, a branch of category theory that formalizes exactly this: how local information assembles into global coherence while respecting contextual constraints. Sheaves don't just describe context-dependent meaning—they reveal its geometric structure.


The Problem: Meaning Doesn't Live in Isolation

Traditional semantics treats words as having fixed meanings, perhaps with multiple discrete senses. A dictionary entry for "bank" lists definitions: (1) financial institution, (2) riverbank, (3) to tilt an aircraft. This discrete-sense model captures something real—but it misses the continuous, context-sensitive nature of how meaning actually operates.

Consider "the bank collapsed." Without context, this sentence is genuinely ambiguous. But meaning doesn't live in the sentence alone. It emerges from:

  • Surrounding sentences (previous mention of flooding vs. financial crisis)
  • Broader context (a geology paper vs. economics news)
  • Shared knowledge (current events, conversational history)
  • Pragmatic cues (speaker's expertise, conversation goals)

Meaning is contextual all the way down. The question isn't "what does this word mean?" but rather "what does this word mean here, given this surrounding information?"

Sheaf theory provides the mathematical machinery to make this precise.


What a Sheaf Is

A sheaf is a mathematical structure that:

  1. Associates data to open sets of a topological space
  2. Specifies how local data must agree on overlaps
  3. Allows reconstruction of global data from compatible local pieces

In simpler terms: a sheaf tells you how to glue together local information into a globally coherent whole—but only when the local pieces are mutually compatible.

The Formal Structure

Given a topological space X (think of this as your "context space"), a sheaf F assigns:

  • To each open set U ⊆ X, a set F(U) of "sections" (possible values/meanings in that context)
  • Restriction maps that take a section over U and restrict it to any smaller open set V ⊆ U
  • Gluing axioms that ensure:
    • If you have local sections that agree on overlaps, they uniquely glue to a global section
    • If a global section restricts to zero on a cover, it must be zero globally

The gluing axioms are crucial. They formalize the idea that local coherence determines global structure—but only when the local pieces are genuinely compatible.


Sheaves of Meaning: Context as Topology

Now translate this to semantics. Let your topological space X represent contexts—think of each point as a possible conversational, situational, or informational context. Open sets represent neighborhoods of context: collections of similar or related contexts.

A sheaf of meanings assigns:

  • To each context neighborhood U, the possible meanings/interpretations available in that context
  • Restriction maps that narrow meaning as context becomes more specific
  • Gluing conditions that ensure: if meanings are locally compatible across overlapping contexts, they determine a unique global interpretation

Example: "Bank" as a Sheaf

Consider contexts arranged by topic:

  • Open set U₁: financial/economic contexts
  • Open set U₂: geological/environmental contexts
  • Open set U₃: aviation contexts
  • Overlap U₁ ∩ U₂: contexts involving environmental economics or natural resource management

In U₁, "bank" predominantly means financial institution. In U₂, it means riverbank. In U₃, it means the tilting maneuver.

But notice: in the overlap U₁ ∩ U₂ (environmental economics), both meanings might be active. A sentence like "the erosion threatened the bank's stability" could genuinely evoke both senses—riverbank erosion affecting a financial institution dependent on local resources.

The sheaf structure captures this: meanings in overlapping contexts must restrict consistently. The financial sense in U₁ and the geological sense in U₂ are compatible in the overlap because context provides sufficient disambiguation cues.


The Gluing Condition: When Meanings Cohere

Here's where it gets powerful. The gluing axiom says: if you have locally coherent meanings across a cover of contexts, they uniquely determine a global meaning.

This formalizes a fundamental aspect of comprehension: you assemble meaning from pieces. Reading a paragraph, you integrate sentence-by-sentence meanings. Understanding a conversation, you accumulate contextual information across turns. The sheaf condition says this assembly process is well-defined when the local pieces are mutually compatible.

When Gluing Fails

But what happens when local meanings don't cohere?

Consider a text that switches contexts abruptly:

"The bank announced record profits. Erosion undermined the bank. The pilot banked left to avoid turbulence."

Without additional context linking these sentences, the local meanings don't glue. Each sentence activates a different sense of "bank," but they don't restrict compatibly to form a coherent global interpretation. The sheaf fails to produce a global section.

This is the mathematical formalization of semantic incoherence. When local information doesn't satisfy the gluing condition, no coherent global meaning exists.


Quantum Contextuality: Where Sheaves Get Strange

Sheaf theory wasn't developed for linguistics—it emerged in algebraic geometry and topology. But its most striking application appears in quantum mechanics, where it formalizes a phenomenon called contextuality.

The Kochen-Specker Theorem

In classical physics, observables have values independent of measurement context. A particle has a position and momentum, period. But quantum mechanics violates this: the value of an observable depends on which other observables you're measuring alongside it.

This isn't measurement error or ignorance. It's fundamental: quantum observables don't have context-independent values.

The Kochen-Specker theorem proves this mathematically. Attempts to assign consistent values to all observables simultaneously lead to contradictions. Quantum mechanics is irreducibly contextual.

Sheaf theory provides the precise framework for this. Quantum observables form a sheaf over contexts (sets of simultaneously measurable observables). Local assignments of values exist, but they don't glue to a global assignment. The sheaf cohomology is nontrivial—obstructions to gluing are geometrically encoded.


From Quantum Mechanics to Human Meaning

Why does this matter for understanding meaning?

Because human semantics exhibits the same contextual structure. Meanings don't exist independently of contexts—they emerge within contexts and depend on which other meanings are co-activated.

Consider polysemy: a word like "run" has dozens of senses (run a program, run a marathon, run for office, run a fever). These aren't entirely discrete—they form a semantic manifold where nearby senses blend smoothly while distant senses remain distinct.

Crucially, activating one sense changes the availability of others. In a sports context, "run a marathon" is salient; "run a program" is backgrounded. Switch to a computing context, and the landscape inverts. The meaning depends on the measurement context.

This is quantum-like contextuality in semantic space. Sheaves provide the geometry.


Sheaves and Semantic Embeddings

Modern NLP uses word embeddings—vector representations that place semantically similar words nearby in high-dimensional space. Models like Word2Vec, GloVe, and transformers learn these embeddings from distributional statistics.

But embeddings face a problem: how to represent polysemy? A single vector for "bank" averages over all senses, losing context-sensitivity.

Recent approaches use contextualized embeddings (like BERT or GPT), where word representations change based on surrounding context. The word "bank" gets different vectors in different sentences.

This is precisely a sheaf structure over context space. Each sentence provides a context (an open set), and the model assigns a meaning vector (a section of the sheaf). The transformer's attention mechanism implements the restriction and gluing operations—ensuring local consistency across token sequences.

The model learns to satisfy the sheaf condition: meanings that cohere locally in similar contexts glue into coherent global representations.


Coherence as Sheaf Cohomology

In AToM terms, coherence is the geometry of systems that work. A system is coherent when its parts fit together—when local dynamics compose into stable global patterns.

Sheaf theory makes this geometrically precise. Coherence is the vanishing of sheaf cohomology.

When a sheaf has trivial cohomology (no obstructions to gluing), local data uniquely determines global structure. The system is coherent—information flows smoothly from local to global scales.

When cohomology is nontrivial, obstructions exist. Local pieces might individually make sense but fail to cohere globally. The system exhibits tension—incompatible constraints that prevent global resolution.

The M = C/T Connection

Recall AToM's core equation: M = C/T (Meaning equals Coherence over Tension). Tension represents curvature in the coherence manifold—regions where simple gluing fails and coordination becomes costly.

In sheaf-theoretic terms:

  • C (Coherence) is measured by how easily local sections glue (low cohomological obstruction)
  • T (Tension) corresponds to cohomological obstructions—the degree to which local data fails to cohere globally
  • M (Meaning) emerges from navigating this landscape—finding paths through context space where gluing succeeds

Meaning is the trajectory that minimizes cohomological obstruction while maximizing coverage of context space.


Contextuality in Neural Systems

This isn't just mathematical abstraction. Neurobiological evidence suggests the brain implements sheaf-like structures.

Predictive processing models describe perception as hierarchical Bayesian inference—lower-level predictions are contextualized by higher-level priors. Visual processing doesn't passively record features; it actively contextualizes local information within global scene understanding.

Consider edge detection: whether a luminance boundary is perceived as an edge depends on surrounding context (lighting conditions, object knowledge, attention state). The same retinal input produces different perceptual outcomes in different contexts.

This is sheaf structure in neural computation. Early visual areas provide local sections (edge detections), while higher areas implement restriction maps (contextualizing via top-down predictions) and gluing operations (assembling coherent scene interpretations).

When context is ambiguous or contradictory, gluing fails—producing perceptual bistability (Necker cube, Rubin vase) or incoherence (impossible figures). The cohomological obstruction manifests as phenomenological instability.


Natural Transformations Between Sheaves

Recall natural transformations—morphisms between functors that preserve structure coherently. Sheaves, being functors from open sets to sets, admit natural transformations.

A sheaf morphism is a natural transformation between two sheaves over the same space. It assigns, for each context U, a map from meanings in one sheaf to meanings in another—coherently across all contexts.

This formalizes translation or semantic mapping. Two languages provide different sheaves of meanings over contexts. A translation is a sheaf morphism—mapping meanings in language A to meanings in language B in a context-sensitive but systematically coherent way.

Crucially, not all sheaves admit morphisms between them. If two semantic systems carve up context space incompatibly—if their local structures don't align—no coherent translation exists. This is the sheaf-theoretic formalization of untranslatability.


Operads and Compositional Sheaves

Sheaves describe how local data glues together. But what governs the operations by which we combine meanings?

Enter operads—categorical structures that formalize multi-input operations and their compositions. In semantics, operads model how words combine into phrases, phrases into sentences, sentences into discourse.

Combining sheaves with operads gives sheaves of operads: structures where both the meanings (sheaf sections) and the composition operations (operad structure) vary with context.

This captures deep compositionality: not only do meanings depend on context, but the rules for combining meanings also depend on context. In formal language, composition is syntactic and rigid. In natural language, composition is pragmatic and flexible—idioms, metaphors, and implicatures all violate compositional literalism.

A sheaf of operads formalizes this: the operad structure (how meanings compose) is itself a sheaf—varying with linguistic, social, and situational context.


The Topology of Context Space

We've been treating context as a topological space, but what determines its topology? What makes two contexts "close" or "far"?

In practice, context space has intricate structure:

  • Syntactic contexts cluster by grammatical patterns
  • Semantic contexts cluster by topic and domain
  • Pragmatic contexts cluster by communicative goals and social situations
  • Temporal contexts form sequences (conversational histories, narrative arcs)

These aren't independent dimensions—they interact. The topology reflects the joint geometry of all contextual factors.

Modern language models implicitly learn this topology. Transformer attention patterns reveal which contexts the model treats as nearby—which past tokens influence current predictions. The learned attention structure approximates the sheaf's restriction maps.

Interestingly, different models learn different context topologies. GPT emphasizes sequential structure (next-token prediction), while BERT emphasizes bidirectional context (masked token prediction). These correspond to different sheaf structures—different ways of organizing how local meanings glue.


Stalks and Germs: Meaning at a Point

Sheaf theory includes a powerful local perspective: stalks and germs.

The stalk of a sheaf at a point x consists of all sections defined in neighborhoods of x, with two sections identified if they agree on some smaller neighborhood. Intuitively, the stalk captures meaning at exactly this context, abstracting away from how far the context extends.

A germ is an equivalence class in the stalk—the essence of what a section says "right here," regardless of global extension.

In semantics, this formalizes situated meaning: what this word means in this exact context, independent of how it might extend to neighboring contexts.

This connects to embodied cognition and 4E approaches: meaning isn't abstract and context-free; it's grounded in specific situations. The stalk structure captures this—meaning emerges at points of concrete context, then extends (via gluing) to broader situations when coherence permits.


Presheaves vs. Sheaves: When Local Doesn't Determine Global

A presheaf satisfies the restriction structure but not necessarily the gluing axiom. Local sections exist and restrict coherently, but they might not uniquely glue to global sections.

Presheaves model incomplete information: you have local pieces, but global coherence isn't guaranteed.

Semantically, this describes ambiguity or underspecification. You have partial meanings in local contexts, but they don't yet cohere into a determinate global interpretation. More context is needed.

The process of sheafification turns a presheaf into a sheaf by forcing the gluing axiom to hold—adding exactly the global sections required for coherence.

In comprehension terms, sheafification is the resolution of ambiguity: accumulating enough context that local meanings glue into a coherent global understanding.


Sheaves and Active Inference

The connection to active inference is direct. In the Free Energy Principle framework, agents minimize prediction error by updating beliefs (perception) or changing the world (action).

Sheaves formalize the belief structure: how local predictions (sections over sensory contexts) glue into global models (generative models of the world).

Active inference adds dynamics: the agent samples contexts (open sets) strategically to minimize uncertainty. Attention is the selection of which contexts to query—which open sets to evaluate sections over.

Prediction error corresponds to failure of gluing: when sensory data in overlapping contexts don't cohere, the sheaf has obstructions. The agent must either:

  1. Update beliefs (modify the sheaf to restore gluing)
  2. Act (change the world to make contexts cohere)
  3. Recontextualize (shift which open sets are relevant, changing the topology)

Meaning, in this framework, is the coherent global section that emerges when gluing succeeds—the interpretation that minimizes free energy across all sampled contexts.


Why Sheaves Matter for Meaning

Sheaf theory isn't just another mathematical formalism. It provides something essential: a rigorous geometry for context-dependence.

Most theories treat context as an add-on—meaning is primary, and context modulates it. Sheaf theory inverts this: context is the space, and meaning is what's defined over it. You can't have meaning without context, just as you can't have a sheaf section without an open set to define it over.

This resolves longstanding tensions:

  • Compositionality vs. Holism: Meanings compose locally (via restriction) but depend holistically on global coherence (via gluing)
  • Stability vs. Flexibility: Meanings are stable within contexts (stalk structure) but flexible across contexts (different sections over different open sets)
  • Discreteness vs. Continuity: Distinct senses (sections over disjoint open sets) coexist with continuous variation (sections over overlapping sets)

Sheaves show these aren't contradictions—they're complementary aspects of the same geometric structure.


Building a Sheaf-Theoretic Semantics

What would it look like to actually implement sheaf-theoretic semantics?

Step 1: Define Context Space

Construct a topological space representing contexts. This could be:

  • Discrete (finite set of context types with inclusion relations)
  • Metric (contexts embedded in vector space with distance-based topology)
  • Combinatorial (simplicial complex of contextual features)

Step 2: Specify Meaning Assignments

For each open set (context neighborhood), define the set of possible meanings. This could be:

  • Discrete (finite sense inventory)
  • Continuous (vector embeddings)
  • Structured (logical forms, knowledge graphs)

Step 3: Define Restrictions

Specify how meanings narrow when context becomes more specific. This requires:

  • Semantic compatibility (which global meanings are consistent with which local contexts)
  • Composition rules (how sub-context meanings combine)

Step 4: Check Gluing

Verify that compatible local meanings uniquely determine global meanings. If gluing fails (nontrivial cohomology), either:

  • Accept ambiguity (presheaf, not sheaf)
  • Refine topology (split contexts more finely)
  • Modify meanings (adjust compatibility)

Step 5: Compute with Sheaves

Use sheaf-theoretic operations:

  • Pushforward: How do meanings transform when changing context representation?
  • Pullback: How do global constraints restrict local meanings?
  • Cohomology: Where do obstructions to coherence appear?

Modern category theory libraries (in Python, Haskell, etc.) provide tools for sheaf computation. Applying these to semantic spaces is frontier research—but the mathematical machinery is ready.


What This Enables

Understanding semantics as sheaf structure opens new possibilities:

In AI/NLP: Design architectures that explicitly model context topology and gluing conditions, improving compositional generalization and coherence.

In Cognitive Science: Formalize how brains implement context-sensitive meaning assembly, connecting to neural geometry and predictive processing.

In Philosophy: Resolve debates about meaning, reference, and intentionality by grounding them in rigorous geometric structures.

In Communication: Analyze why some messages cohere across contexts (low cohomology) while others fragment (high cohomology), informing design of robust communication systems.

In AToM: Integrate sheaf-theoretic coherence measures into the broader framework, quantifying meaning as navigable structure through context space.

The categorical perspective transforms meaning from a mysterious mental substance to a navigable geometric object—something you can map, measure, and mathematically manipulate.


This is Part 7 of the Applied Category Theory series, exploring how categorical frameworks provide mathematical foundations for compositional systems.

Previous: String Diagrams: Drawing Your Way to Mathematical Insight
Next: Operads and the Algebra of Composition: From Syntax to Semantics


Further Reading

  • Abramsky, S. & Brandenburger, A. (2011). "The Sheaf-Theoretic Structure of Non-Locality and Contextuality." New Journal of Physics, 13(11), 113036.
  • Goguen, J. (1992). "Sheaf Semantics for Concurrent Interacting Objects." Mathematical Structures in Computer Science, 2(2), 159-191.
  • Boleda, G. & Herbelot, A. (2016). "Formal Distributional Semantics: Introduction to the Special Issue." Computational Linguistics, 42(4), 619-635.
  • Gärdenfors, P. (2014). The Geometry of Meaning: Semantics Based on Conceptual Spaces. MIT Press.
  • Friston, K. (2019). "A Free Energy Principle for a Particular Physics." arXiv:1906.10184.