Knowledge Graphs 101: Nodes, Edges, and Semantic Structure

Knowledge Graphs 101: Nodes, Edges, and Semantic Structure
Knowledge Graphs 101: nodes, edges, and semantic structure.

Knowledge Graphs 101: Nodes, Edges, and Semantic Structure

Series: Graph RAG | Part: 3 of 10

A knowledge graph is a database that thinks in relationships.

Instead of storing information in tables or documents, it stores meaning as a network: entities connected by labeled edges that specify how they relate. The structure isn't metadata about your knowledge—it is your knowledge, made explicit and machine-traversable.

This isn't a new idea. Knowledge graphs descended from semantic networks in AI research from the 1960s, evolved through expert systems in the 1980s, and became practical infrastructure with the web's Linked Data movement in the 2000s. Google's Knowledge Graph—the thing that shows you a fact panel when you search for "Marie Curie"—made the concept mainstream in 2012.

But knowledge graphs stopped being a curiosity and became essential infrastructure when AI agents needed to reason. Because reasoning is graph traversal. And graph traversal requires graphs.


The Core Components: Nodes and Edges

Nodes: The Entities

A node represents a thing. An entity. A concept that exists and can be referenced.

Examples:

  • Marie Curie (a person)
  • Nobel Prize in Physics (an award)
  • Radioactivity (a phenomenon)
  • University of Paris (an institution)
  • 1903 (a year)

Each node has:

  • An identifier (often a URI like wikidata:Q7186)
  • A type or category (Person, Award, Concept, etc.)
  • Properties (birth date, description, aliases)

The node doesn't store a document about Marie Curie. It represents the entity Marie Curie as a discrete object in the knowledge space. Other nodes can point to it. It can point to other nodes. It participates in relationships.

Edges: The Relationships

An edge connects two nodes with a labeled relationship that specifies how they relate.

Examples:

  • Marie CuriewonNobel Prize in Physics
  • Marie CuriediscoveredRadioactivity
  • Marie Curieworked_atUniversity of Paris
  • Nobel Prize in Physicsawarded_in1903

The edge label carries semantic meaning. won is different from discovered is different from worked_at. The type of relationship matters. It's not just that Marie Curie and the Nobel Prize are connected—it's that she won it.

This is the fundamental difference from vector embeddings. Embeddings give you "Marie Curie and Nobel Prize are related (cosine similarity: 0.89)." Knowledge graphs give you "Marie Curie won the Nobel Prize in Physics in 1903, which was awarded for her discovery of radioactivity."

Structure, not similarity.


Why Triples Are Fundamental

Knowledge graphs encode information as triples: (subject, predicate, object).

(Marie Curie, won, Nobel Prize in Physics)
(Marie Curie, born_in, 1867)
(Marie Curie, died_in, 1934)
(Radioactivity, discovered_by, Marie Curie)

This is the atomic unit of structured knowledge. Each triple is a single factual claim that can be independently verified, composed with other triples, and traversed during queries.

The simplicity is powerful. You can represent arbitrarily complex knowledge by accumulating triples. You can query across millions of triples by following edges. You can extend the graph incrementally without restructuring everything.

And critically: you can reason by traversal.

"Who won a Nobel Prize for discovering radioactivity?" becomes:

  1. Find all X where (Radioactivity, discovered_by, X)
  2. Find all Y where (X, won, Y) and Y has type Nobel Prize

The answer emerges from following edges. No semantic similarity calculation required.


Properties and Attributes

Nodes can have properties—key-value attributes that describe them:

Node: Marie Curie (wikidata:Q7186)
Type: Person
Properties:
  - birth_date: 1867-11-07
  - birth_place: Warsaw, Poland
  - death_date: 1934-07-04
  - nationality: Polish, French
  - field: Physics, Chemistry
  - known_for: Radioactivity, Polonium, Radium

These properties are metadata about the entity. They're queryable but they're not relationships to other entities. "Birth date" is an attribute. "Born in Warsaw" could be modeled as an edge to a Warsaw node if you want to query all people born in Warsaw.

The decision of what to model as a property versus an edge depends on whether you need to traverse it. If you'll query "Find all scientists born in the same city," model birthplace as an edge to a City node. If you just need the fact "Marie Curie was born in Warsaw," a property suffices.


Schema and Ontology

A knowledge graph can be schema-free—any node can connect to any other node with any relationship—but most production graphs define a schema or ontology that specifies:

  • What types of entities exist (Person, Place, Event, Concept)
  • What properties each type can have
  • What relationships are valid between types
  • Hierarchies and constraints

An ontology is a formal specification of what exists and how it relates. In philosophy, ontology asks "What is?" In knowledge graphs, ontology asks "What types of things are in this domain and how can they connect?"

Example ontology fragment:

Person:
  - can have property: birth_date, death_date, nationality
  - can have relationship: born_in → Place
  - can have relationship: worked_at → Institution
  - can have relationship: won → Award

Award:
  - can have property: year, category
  - can have relationship: awarded_to → Person
  - can have relationship: awarded_for → Discovery

The ontology creates constraints that prevent nonsensical relationships (a Place can't win an Award) and enables inference (if Marie Curie worked_at University of Paris and University of Paris located_in France, then Marie Curie worked_in France by transitive closure).


Types of Knowledge Graphs

Open Domain: Wikidata and DBpedia

Open-domain knowledge graphs attempt to represent all human knowledge without restriction to a particular domain.

Wikidata is the most ambitious: 100+ million entities, 1.4+ billion triples, covering people, places, events, concepts, species, chemicals, artworks, everything. It's the structured data backbone of Wikipedia.

DBpedia extracts structured data from Wikipedia infoboxes, creating a graph of encyclopedic knowledge.

These are reference graphs—canonical sources of factual information that other systems can link to and query.

Domain-Specific: Medical, Enterprise, Scientific

Domain-specific graphs focus on deep coverage of a particular domain:

  • Medical ontologies like UMLS (Unified Medical Language System) with millions of medical concepts and relationships
  • Enterprise knowledge graphs capturing internal business knowledge—products, customers, processes, dependencies
  • Scientific knowledge graphs like the Microsoft Academic Graph or Semantic Scholar's graph of papers, authors, citations, and concepts

Domain graphs trade breadth for depth. They model the domain with precision, capturing relationships and constraints specific to that field.

Personal: Your Data as Graph

The emerging pattern is personal knowledge graphs—representing an individual's or organization's specific knowledge in graph form:

  • Your codebase as a graph of services, dependencies, APIs, and configurations
  • Your company's documentation as a graph of concepts, procedures, and relationships
  • Your research notes as a graph of papers, ideas, and connections

This is where Graph RAG becomes practical. You don't need all of Wikidata. You need a graph of your knowledge structured in ways that enable reasoning over it.


Querying Knowledge Graphs

Knowledge graphs are queried through graph query languages, most commonly SPARQL (for RDF graphs) or Cypher (for property graphs like Neo4j).

A SPARQL query to find Nobel Prize winners:

SELECT ?person ?prize WHERE {
  ?person wdt:P166 ?prize .
  ?prize wdt:P31 wd:Q7191 .
}

Translation: Find all ?person and ?prize where the person received (P166) the prize, and the prize is an instance of (P31) Nobel Prize (Q7191).

A Cypher query for the same:

MATCH (person:Person)-[:WON]->(prize:Award)
WHERE prize.type = 'Nobel Prize'
RETURN person, prize

Both languages let you express graph patterns—the shape of the subgraph you're looking for—and return matches.

This is fundamentally different from SQL. SQL queries tables and joins. Graph queries express patterns and traverse relationships. You're not asking "Which rows in the Awards table have a person_id that matches this person?" You're asking "Follow the WON edge from this person and return whatever it points to."

The cognitive model shifts from tabular joins to path traversal.


The Geometric Structure of Knowledge

In AToM terms, a knowledge graph is a discrete manifold—a space of meaning with well-defined structure.

Nodes are points in knowledge space. Edges are the dimensions along which meaning extends. The graph's topology—which nodes connect to which others through which relationships—defines the coherence structure of the domain.

Reasoning becomes navigation. To answer a question, you traverse from known starting points (entities mentioned in the query) along edges (relationships) to destination points (entities that satisfy the query). Multi-hop reasoning is literally multi-hop traversal: following multiple edges in sequence.

The graph preserves semantic structure that vector embeddings collapse. It represents the type of relationship, not just the fact of relatedness. It maintains the directionality of dependencies, the hierarchy of categories, the temporal sequence of events.

Where vector search asks "What is near?" graph traversal asks "What is reachable by following these specific types of connections?"

This is why knowledge graphs enable reasoning. Reasoning requires traversing structured relationships. Graphs make those relationships explicit and traversable.


The Integration Challenge

The practical difficulty with knowledge graphs isn't the concept—it's construction and maintenance.

Building a graph requires:

  • Extracting entities from unstructured text
  • Identifying relationships between entities
  • Disambiguating entity references (is "Apple" the company or the fruit?)
  • Maintaining consistency as the graph grows
  • Handling uncertainty and conflicting sources

This is hard. Which is why the next article addresses it directly: how do you automatically construct knowledge graphs from the documents you already have?

But the challenge is worth it. Because once you have a knowledge graph, you have infrastructure for reasoning. You have structure that AI agents can traverse. You have the foundation for retrieval that doesn't just find relevant text—it navigates semantic relationships.


Further Reading

  • Hogan, A. et al. (2021). "Knowledge Graphs." ACM Computing Surveys 54(4).
  • Bollacker, K. et al. (2008). "Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge." SIGMOD.
  • Noy, N. et al. (2019). "Industry-Scale Knowledge Graphs: Lessons and Challenges." Communications of the ACM.

This is Part 3 of the Graph RAG series, exploring how knowledge graphs solve the limitations of naive vector retrieval.

Previous: The Limits of Naive RAG: Why Your AI Agent Can't Reason
Next: Building Knowledge Graphs from Documents: Extraction Pipelines