Multi-Hop Reasoning: How Graphs Enable Complex Queries

Multi-Hop Reasoning: How Graphs Enable Complex Queries
Multi-hop reasoning: how graphs enable complex queries.

Multi-Hop Reasoning: How Graphs Enable Complex Queries

Series: Graph RAG | Part: 5 of 10

The test of a knowledge system isn't whether it can answer simple questions. It's whether it can answer questions that require connecting information across multiple sources.

"Who won the Nobel Prize in Physics?"—easy. Look it up.

"Which Nobel Prize winners worked at institutions founded before 1800 and collaborated with researchers who later won Fields Medals?"—hard. This requires traversing relationships: from prizes to people, from people to institutions, from institutions to founding dates, from people to collaborators, from collaborators to other prizes.

Vector search can't do this. It finds documents that mention Nobel Prizes, or institutions, or Fields Medals. But it can't follow the chain of relationships necessary to construct the answer.

Knowledge graphs can. This is multi-hop reasoning: answering complex queries by traversing paths through a graph of structured relationships.

And this is where Graph RAG becomes transformative.


What Multi-Hop Reasoning Means

A single-hop query requires one step:

Query: "What did Marie Curie discover?"
Graph traversal: (Marie Curie) -[DISCOVERED]-> (?)
Answer: Radium, Polonium

Start at the Marie Curie node. Follow outgoing DISCOVERED edges. Return whatever they point to.

A multi-hop query requires traversing multiple edges in sequence:

Query: "What did the discoverer of radium win?"
Graph traversal:
  1. (?) -[DISCOVERED]-> (Radium)  → Marie Curie
  2. (Marie Curie) -[WON]-> (?)    → Nobel Prize in Physics (1903), Nobel Prize in Chemistry (1911)
Answer: Two Nobel Prizes

First hop: find who discovered radium. Second hop: find what that person won.

The complexity scales with the number of hops:

Query: "Which researchers have won prizes for work related to
discoveries made by Nobel laureates who worked at the University of Paris?"

Traversal:
  1. (University of Paris) -[EMPLOYED]-> (?) → [Marie Curie, Pierre Curie, ...]
  2. (Those people) -[WON]-> (Nobel Prize)
  3. (Those people) -[DISCOVERED]-> (?) → [Radium, Polonium, ...]
  4. (Related to those discoveries) -[RELATED_TO]-> (?) → [other work]
  5. (?) -[CONDUCTED]-> (That work)
  6. (Those people) -[WON]-> (?) → [various prizes]
Answer: [List of researchers and prizes]

Each hop filters and extends the result set. You're not searching for similarity—you're navigating semantic structure.


Why Vector Search Fails at Multi-Hop

Vector embeddings collapse all relationships into distance. But multi-hop reasoning requires preserving the type and directionality of relationships.

Consider: "Find all services that depend on authentication."

With vector search:

You embed "services that depend on authentication" and retrieve chunks with high semantic similarity. You might find:

  • Documentation about the authentication service
  • Descriptions of services that mention authentication
  • Discussions of dependency management

But you won't get a complete list of dependent services because:

  1. Not all dependencies are mentioned in text. A service might depend on authentication through a configuration file or API call that isn't documented in a paragraph.

  2. Transitive dependencies are invisible. Service B depends on Service A, which depends on Authentication. Service B's documentation might not mention authentication at all. Vector search won't connect them.

  3. Similar language doesn't imply dependency. Two services might discuss authentication in similar ways without depending on each other.

With a knowledge graph:

MATCH (s:Service)-[:DEPENDS_ON*1..]->(auth:Service {name: "Authentication"})
RETURN s

This traverses DEPENDS_ON edges from the Authentication node. *1.. means "follow DEPENDS_ON for any number of hops." You get all direct and transitive dependencies—every service in the dependency chain, regardless of whether they're documented together or use similar language.

The graph preserves the relationship structure that vector embeddings destroy.


Real-World Multi-Hop Scenarios

Dependency Analysis

"If we take the authentication service offline for maintenance, what breaks?"

Graph query:

Find all services connected by DEPENDS_ON edges (any depth) from Authentication

This maps the blast radius of a change. It's impossible with vector search because dependencies aren't about semantic similarity—they're about structural relationships in your system architecture.

Causal Chain Exploration

"What caused the outage on December 15?"

Graph traversal:

1. (Outage Dec 15) -[CAUSED_BY]-> (Database timeout)
2. (Database timeout) -[CAUSED_BY]-> (Connection pool exhaustion)
3. (Connection pool exhaustion) -[CAUSED_BY]-> (Traffic spike)
4. (Traffic spike) -[CAUSED_BY]-> (Marketing campaign)

Each hop follows a causal link. The root cause—a marketing campaign—might not be mentioned in the same document as the outage. But the graph preserves the causal chain.

Research Synthesis

"What are the key papers connecting active inference to robotics?"

Graph traversal:

1. (Papers about active inference) -[CITES]-> (?)
2. (?) -[CITED_BY]-> (Papers about robotics)

Find papers cited by active inference papers and citing robotics papers—these sit at the intersection. This is bibliographic multi-hop reasoning, surfacing bridge papers that connect research areas.

Compliance Checking

"Which data stores contain PII and are accessible to third-party services?"

Graph traversal:

1. (Data stores) -[CONTAINS]-> (PII)
2. (Third-party services) -[ACCESSES]-> (Those data stores)

First hop: find stores with PII. Second hop: find which third-party services access them. This is a compliance query that requires connecting data schemas to access policies—structure vector search can't represent.


Query Complexity and Path Constraints

Multi-hop queries can include complex path constraints:

Path length limits:

MATCH (a)-[:DEPENDS_ON*1..3]->(b)

Follow DEPENDS_ON for 1 to 3 hops maximum. Prevents infinite traversal in cyclic graphs.

Relationship type sequences:

MATCH (a)-[:CREATED]->(b)-[:REFERENCES]->(c)

Follow a specific sequence: CREATED then REFERENCES. Order matters.

Property filters along the path:

MATCH (a:Person)-[:WORKED_AT]->(org:Organization)
WHERE org.founded < 1800

Traverse only through nodes matching criteria.

Aggregation over paths:

MATCH path = (a)-[:COLLABORATES*]-(b)
WHERE length(path) < 4
RETURN a, b, length(path)

Find all collaboration paths shorter than 4 hops and report the distance.

These queries express precise structural requirements. They're asking geometric questions about the graph: "What's reachable from here following these types of edges with these constraints?"


The Computational Challenge

Multi-hop queries can be expensive. Following DEPENDS_ON edges recursively through a massive graph of services can explode combinatorially.

Optimization strategies:

1. Bidirectional search

Instead of traversing from A to find B, traverse from both A and B and meet in the middle. This reduces search space exponentially.

2. Path indexing

Precompute common paths and store them. "All services depending on Authentication" becomes a cached result, updated when dependencies change.

3. Relationship strength weighting

Not all edges are equally important. Weight edges by frequency, recency, or confidence and preferentially traverse strong connections.

4. Constraint pushdown

Apply filters as early as possible. If you're looking for organizations founded before 1800, filter on founding date before traversing to employees—don't traverse to millions of employees then filter.

Modern graph databases implement these optimizations. Neo4j, for example, uses traversal planning to minimize the number of nodes visited during multi-hop queries.


The most powerful pattern combines both:

  1. Use vector search to find starting points
  2. Use graph traversal to explore relationships
  3. Use vector similarity to rank results

Example: "Find documentation related to authentication failures"

1. Vector search: Find chunks semantically similar to "authentication failures"
   → [Doc A, Doc B, Doc C]

2. Entity linking: Extract entities from those chunks
   → [Authentication Service, Login API, Session Manager]

3. Graph traversal: Find entities connected to those starting points
   → DEPENDS_ON, CALLS, CONFIGURED_BY relationships

4. Expand result set: Include documentation for connected entities

5. Rank by combined score: Vector similarity + graph distance

Vector search provides recall—it casts a wide net. Graph traversal provides precision—it follows structured relationships to find exactly what's connected.

This hybrid approach is what production Graph RAG systems actually use. Pure graph traversal is too rigid. Pure vector search is too shallow. Combining them leverages the strengths of both.


When Multi-Hop Reasoning Matters

Multi-hop reasoning becomes essential when:

  • Relationships define correctness: Dependencies, causality, hierarchies
  • Information is distributed: No single document contains the complete answer
  • Structure matters more than content: The connection type is the answer, not the text
  • Transitive properties propagate: Permissions, trust, contamination, influence

These are precisely the scenarios where AI agents fail with naive RAG. They can retrieve relevant documents, but they can't reason about how entities relate across documents.

Knowledge graphs make the relationships explicit and traversable. Multi-hop queries become graph queries. Reasoning becomes navigation.


The Path from Questions to Queries

The remaining challenge: translating natural language questions into graph queries.

A user doesn't ask: "MATCH (s:Service)-[:DEPENDS_ON*1..]->(auth:Service {name: 'Authentication'}) RETURN s"

They ask: "What depends on authentication?"

Bridging this gap—from natural language to structured queries—is the frontier of Graph RAG research. Some approaches use LLMs to generate Cypher queries from questions. Others use semantic parsing to map questions to graph patterns.

But once the query is constructed, execution is deterministic. The graph returns exactly what's connected by the specified relationships. No hallucination. No missing context. Just traversal.


Further Reading

  • Yih, W. et al. (2015). "Semantic Parsing via Staged Query Graph Generation." ACL 2015.
  • Sun, H. et al. (2018). "Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text." EMNLP 2018.
  • Xiong, C. et al. (2017). "DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning." EMNLP 2017.

This is Part 5 of the Graph RAG series, exploring how knowledge graphs solve the limitations of naive vector retrieval.

Previous: Building Knowledge Graphs from Documents: Extraction Pipelines
Next: Microsoft GraphRAG: Architecture and Lessons Learned