Knowledge Graphs: Difference between revisions

From BloomWiki
Jump to navigation Jump to search
New article: Knowledge Graphs structured through Bloom's Taxonomy
 
BloomWiki: Knowledge Graphs
 
Line 1: Line 1:
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
{{BloomIntro}}
{{BloomIntro}}
Knowledge Graphs (KGs) are structured representations of information in which entities (people, places, concepts, organizations) are nodes and the relationships between them are typed, directed edges. Where tables store attributes of individual entities, knowledge graphs store the web of relationships between entities — enabling reasoning, inference, and navigation across connected knowledge. Major knowledge graphs include Google's Knowledge Graph (powering search results), Wikidata, DBpedia, and numerous proprietary enterprise knowledge graphs. In the AI era, knowledge graphs are experiencing a renaissance as a complement to neural AI — providing structured, verifiable, interpretable knowledge to ground language model outputs.
Knowledge Graphs (KGs) are structured representations of information in which entities (people, places, concepts, organizations) are nodes and the relationships between them are typed, directed edges. Where tables store attributes of individual entities, knowledge graphs store the web of relationships between entities — enabling reasoning, inference, and navigation across connected knowledge. Major knowledge graphs include Google's Knowledge Graph (powering search results), Wikidata, DBpedia, and numerous proprietary enterprise knowledge graphs. In the AI era, knowledge graphs are experiencing a renaissance as a complement to neural AI — providing structured, verifiable, interpretable knowledge to ground language model outputs.
</div>


== Remembering ==
__TOC__
 
<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Entity''' — A distinct real-world object or concept represented as a node: a person (Albert Einstein), an organization (NASA), a concept (Relativity).
* '''Entity''' — A distinct real-world object or concept represented as a node: a person (Albert Einstein), an organization (NASA), a concept (Relativity).
* '''Relationship (predicate)''' — A typed, directed connection between two entities: "bornIn," "worksFor," "isA," "hasCapital."
* '''Relationship (predicate)''' — A typed, directed connection between two entities: "bornIn," "worksFor," "isA," "hasCapital."
Line 18: Line 23:
* '''Relation extraction''' — Identifying the relationship between two named entities in text; used in automated KG construction.
* '''Relation extraction''' — Identifying the relationship between two named entities in text; used in automated KG construction.
* '''Wikidata''' — A free, multilingual, community-maintained knowledge graph with hundreds of millions of triples.
* '''Wikidata''' — A free, multilingual, community-maintained knowledge graph with hundreds of millions of triples.
</div>


== Understanding ==
<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Understanding</span> ==
Knowledge graphs represent knowledge as a directed, typed multigraph. Unlike a relational database that describes each entity's attributes in rows and columns, a KG describes the '''web of relationships''' — enabling multi-hop reasoning that relational databases make awkward.
Knowledge graphs represent knowledge as a directed, typed multigraph. Unlike a relational database that describes each entity's attributes in rows and columns, a KG describes the '''web of relationships''' — enabling multi-hop reasoning that relational databases make awkward.


Line 36: Line 43:


'''Symbolic vs. neural AI''': Knowledge graphs are a form of '''symbolic AI''' — explicit, interpretable, structured representation. Neural models (LLMs) are statistical learners of implicit patterns. The combination — neuro-symbolic AI — is a growing research direction. RAG with a knowledge graph (GraphRAG) retrieves structured facts rather than unstructured text chunks, enabling more precise and verifiable grounding.
'''Symbolic vs. neural AI''': Knowledge graphs are a form of '''symbolic AI''' — explicit, interpretable, structured representation. Neural models (LLMs) are statistical learners of implicit patterns. The combination — neuro-symbolic AI — is a growing research direction. RAG with a knowledge graph (GraphRAG) retrieves structured facts rather than unstructured text chunks, enabling more precise and verifiable grounding.
</div>


== Applying ==
<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Applying</span> ==
'''Building and querying a knowledge graph with Neo4j (Python driver):'''
'''Building and querying a knowledge graph with Neo4j (Python driver):'''


Line 92: Line 101:
: '''Web scraping + structured sources''' → Infoboxes, tables, linked data sources (Freebase model)
: '''Web scraping + structured sources''' → Infoboxes, tables, linked data sources (Freebase model)
: '''Hybrid''' → Automated extraction + expert curation + community correction
: '''Hybrid''' → Automated extraction + expert curation + community correction
</div>


== Analyzing ==
<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Analyzing</span> ==
{| class="wikitable"
{| class="wikitable"
|+ Knowledge Representation Comparison
|+ Knowledge Representation Comparison
Line 115: Line 126:
* '''Schema evolution''' — As understanding of a domain evolves, the ontology needs updating, which can invalidate existing triples.
* '''Schema evolution''' — As understanding of a domain evolves, the ontology needs updating, which can invalidate existing triples.
* '''Entity ambiguity''' — "Apple" could be the company, the fruit, or countless others. Entity linking (mapping text mentions to KG entities) is difficult and error-prone.
* '''Entity ambiguity''' — "Apple" could be the company, the fruit, or countless others. Entity linking (mapping text mentions to KG entities) is difficult and error-prone.
</div>


== Evaluating ==
<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Evaluating</span> ==
Expert evaluation of knowledge graphs is multi-dimensional:
Expert evaluation of knowledge graphs is multi-dimensional:


Line 130: Line 143:


Expert practitioners also evaluate '''provenance and freshness''': For each triple, is its source known and trusted? How recently was it validated? Temporal knowledge graphs additionally track when facts were true, enabling time-sensitive queries.
Expert practitioners also evaluate '''provenance and freshness''': For each triple, is its source known and trusted? How recently was it validated? Temporal knowledge graphs additionally track when facts were true, enabling time-sensitive queries.
</div>


== Creating ==
<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Creating</span> ==
Designing a domain knowledge graph from scratch:
Designing a domain knowledge graph from scratch:


Line 163: Line 178:
     ↓
     ↓
[Knowledge base population]
[Knowledge base population]
</div>

Latest revision as of 01:53, 25 April 2026

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Knowledge Graphs (KGs) are structured representations of information in which entities (people, places, concepts, organizations) are nodes and the relationships between them are typed, directed edges. Where tables store attributes of individual entities, knowledge graphs store the web of relationships between entities — enabling reasoning, inference, and navigation across connected knowledge. Major knowledge graphs include Google's Knowledge Graph (powering search results), Wikidata, DBpedia, and numerous proprietary enterprise knowledge graphs. In the AI era, knowledge graphs are experiencing a renaissance as a complement to neural AI — providing structured, verifiable, interpretable knowledge to ground language model outputs.

Remembering[edit]

  • Entity — A distinct real-world object or concept represented as a node: a person (Albert Einstein), an organization (NASA), a concept (Relativity).
  • Relationship (predicate) — A typed, directed connection between two entities: "bornIn," "worksFor," "isA," "hasCapital."
  • Triple — The fundamental unit of a knowledge graph: (Subject, Predicate, Object). Example: (Albert_Einstein, bornIn, Ulm).
  • RDF (Resource Description Framework) — A W3C standard for representing knowledge graph triples using URIs.
  • SPARQL — The query language for RDF knowledge graphs, analogous to SQL for relational databases.
  • Ontology — A formal specification of concepts, categories, and relationships within a domain; defines the schema of a knowledge graph.
  • OWL (Web Ontology Language) — A W3C language for defining ontologies with rich semantic constraints.
  • Property Graph — An alternative KG model where nodes and edges can have attributes (key-value pairs). Used in Neo4j.
  • Cypher — The query language for property graph databases like Neo4j.
  • Knowledge Graph Embedding — Representing entities and relations as vectors in a continuous space for machine learning over KGs.
  • Link prediction — The task of inferring missing relationships in a knowledge graph from existing ones.
  • Entity alignment — Matching entities across two different knowledge graphs that refer to the same real-world object.
  • Named Entity Recognition (NER) — The NLP task of identifying entities in text; first step in knowledge graph construction from text.
  • Relation extraction — Identifying the relationship between two named entities in text; used in automated KG construction.
  • Wikidata — A free, multilingual, community-maintained knowledge graph with hundreds of millions of triples.

Understanding[edit]

Knowledge graphs represent knowledge as a directed, typed multigraph. Unlike a relational database that describes each entity's attributes in rows and columns, a KG describes the web of relationships — enabling multi-hop reasoning that relational databases make awkward.

Example: "What are the birthplaces of Nobel Prize winners in Physics who studied at German universities?"

In a relational database, this requires multiple JOINs across tables. In a knowledge graph, it's a graph traversal: find Nobel Physics winners → follow "studiedAt" to universities → filter to Germany → follow "bornIn" to places.

The Open World Assumption: A key semantic difference from relational databases. KGs assume that the absence of a triple does not mean it's false — the information may simply not be recorded. (Closed World Assumption in relational databases: if a row doesn't exist, the fact is false.)

Knowledge Graph Embeddings (TransE, RotatE, ComplEx) learn dense vector representations of entities and relations, enabling:

  • Link prediction: can we predict missing triples?
  • Similarity computation: are these entities similar?
  • KG completion: enrich an incomplete KG using learned patterns

TransE (a foundational embedding method) represents each relation as a translation in embedding space: h + r ≈ t for each triple (h, r, t). "Paris + locatedIn → France" should hold approximately in the embedding space.

Symbolic vs. neural AI: Knowledge graphs are a form of symbolic AI — explicit, interpretable, structured representation. Neural models (LLMs) are statistical learners of implicit patterns. The combination — neuro-symbolic AI — is a growing research direction. RAG with a knowledge graph (GraphRAG) retrieves structured facts rather than unstructured text chunks, enabling more precise and verifiable grounding.

Applying[edit]

Building and querying a knowledge graph with Neo4j (Python driver):

<syntaxhighlight lang="python"> from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687",

                              auth=("neo4j", "password"))

def create_knowledge_graph(session):

   """Create a simple research knowledge graph."""
   # Create entities and relationships using Cypher
   session.run("""
       MERGE (p:Person {name: 'Geoffrey Hinton'})
       MERGE (u:University {name: 'University of Toronto'})
       MERGE (a:Award {name: 'Nobel Prize in Physics'})
       MERGE (c:Concept {name: 'Backpropagation'})
       MERGE (p)-[:WORKED_AT {from: 1987, to: 2023}]->(u)
       MERGE (p)-[:RECEIVED {year: 2024}]->(a)
       MERGE (p)-[:PIONEERED]->(c)
       MERGE (c)-[:ENABLES]->(deep:Concept {name: 'Deep Learning'})
   """)

def find_award_winners_and_contributions(session):

   """Find Nobel Prize winners and what they pioneered."""
   result = session.run("""
       MATCH (p:Person)-[:RECEIVED]->(a:Award {name: 'Nobel Prize in Physics'})
       MATCH (p)-[:PIONEERED]->(c:Concept)
       RETURN p.name AS person, c.name AS contribution
       ORDER BY p.name
   """)
   return [dict(record) for record in result]

def multi_hop_query(session, concept_name):

   """Find all researchers connected to a concept within 2 hops."""
   result = session.run("""
       MATCH path = (p:Person)-[:PIONEERED|CONTRIBUTED_TO*1..2]->(c:Concept)
       WHERE c.name CONTAINS $concept
       RETURN p.name AS researcher, [node in nodes(path) | node.name] AS path_names
       LIMIT 10
   """, concept=concept_name)
   return [dict(record) for record in result]

with driver.session() as session:

   create_knowledge_graph(session)
   winners = find_award_winners_and_contributions(session)
   print(winners)

</syntaxhighlight>

Knowledge graph construction pipeline
Manual curation → Domain experts curate high-precision triples (medical ontologies, legal KGs)
Crowd-sourced → Wikidata model: community contributes and validates
Information extraction → NER + relation extraction from text corpora (automated, noisy)
Web scraping + structured sources → Infoboxes, tables, linked data sources (Freebase model)
Hybrid → Automated extraction + expert curation + community correction

Analyzing[edit]

Knowledge Representation Comparison
Approach Structure Reasoning Scalability Interpretability
Relational database Tables, rows SQL (closed world) Very high High
Knowledge graph (RDF) Triples SPARQL, inference rules High High
Property graph (Neo4j) Nodes + edges with properties Cypher, path queries High High
Vector embeddings alone Implicit in weights Neural similarity Very high Low
Knowledge graph + embeddings Hybrid Symbolic + neural High Medium

Key challenges and failure modes:

  • Incompleteness — Even the largest knowledge graphs (Wikidata: 100M+ triples) are dramatically incomplete. Most entity-relation-entity combinations that are true are not represented.
  • Inconsistency — Different sources record conflicting information. Conflict resolution and provenance tracking are essential but difficult.
  • Coverage-precision trade-off — Manual curation is precise but slow and incomplete; automated extraction has high recall but introduces errors.
  • Schema evolution — As understanding of a domain evolves, the ontology needs updating, which can invalidate existing triples.
  • Entity ambiguity — "Apple" could be the company, the fruit, or countless others. Entity linking (mapping text mentions to KG entities) is difficult and error-prone.

Evaluating[edit]

Expert evaluation of knowledge graphs is multi-dimensional:

Factual accuracy: Randomly sample triples and verify against authoritative sources. Precision is the primary metric for knowledge graphs serving downstream systems.

Coverage / recall: For a specific domain, what fraction of known true facts are represented? Measured by comparing against a held-out set of verified triples.

Link prediction benchmarks: FB15k-237 (Freebase subset) and WN18RR (WordNet subset) are standard benchmarks for evaluating knowledge graph embedding methods. Metrics: Mean Reciprocal Rank (MRR), Hits@10.

Query performance: For production KGs, SPARQL query execution time at p95 for typical query patterns. Neo4j and other property graph DBs provide query profiling tools.

Downstream task impact: Does using the KG improve performance on the target application (question answering, recommendation, entity disambiguation)? This is the ultimate measure of KG quality.

Expert practitioners also evaluate provenance and freshness: For each triple, is its source known and trusted? How recently was it validated? Temporal knowledge graphs additionally track when facts were true, enabling time-sensitive queries.

Creating[edit]

Designing a domain knowledge graph from scratch:

1. Domain scoping and ontology design <syntaxhighlight lang="text"> Define scope: What entities matter? What relationships?

Review existing ontologies: Schema.org, biomedical ontologies (SNOMED, MeSH), industry standards

Design ontology: entity types, relation types, cardinality constraints

Define naming conventions and URI scheme

Validate with domain experts: are these the right concepts? </syntaxhighlight>

2. Knowledge acquisition pipeline <syntaxhighlight lang="text"> Structured sources (databases, APIs, spreadsheets)

   ↓ [Direct mapping to triples]
   

Unstructured sources (text documents, web pages)

   ↓ [NER → entity linking → relation extraction → triple extraction]
   ↓ [Confidence scoring: filter low-confidence triples]
   ↓ [Human validation of uncertain triples]
   

Semi-structured sources (tables, infoboxes)

   ↓ [Table understanding + header interpretation]
   ↓

[Deduplication + entity alignment]

[Knowledge base population]