Crash course in knowledge graph concepts
Learn the fundamentals of knowledge graphs, triples, and RDF concepts
Beginner
5 min
- Basic understanding of data structures
Understand knowledge graph fundamentals including triples (subject-predicate-object), RDF, and URI concepts.
Facts
The knowledge graph represents a collection of assertions of facts. By way of example, this document starts with a collection of facts about my cats:
- Fred lives with Hope
- Fred is a cat
- Fred has 4 legs
Predicates
In English grammar, facts can commonly be seen structured as subject, verb, object. But verbs don’t always work the way that we want them to for representing knowledge. Instead of recording that:
Fred (subject) has (verb) 4 legs (object)
I want to say that
Fred (subject) has legs (verb) 4
Hence we’ll talk about predicates rather than verbs.
In an RDF knowledge graph, facts are triples: subject, predicate, object.
Entity references
All the things that get described in a knowledge graph are entities. Entities need a unique reference. Fred is the name of my cat, but also the name of several other people in my street: to make it clear which entity is in use, a unique reference is needed for every entity.
In RDF, that reference is presented as a URL. It is normal to include the name of the organisation creating the graph and enough components to make the URL clear and unique. So my cat Fred can be http://example.org/animals/fred. Because we’re using a URL, it looks like the sort of thing you could put in a browser. But the URL doesn’t have to point to an actual web page or resource. It can, but it is not necessary.
To record the three cat facts above, I need some more references:
- My cat Hope needs a reference too, so she can be
http://example.org/animal/hope. - The concept of cat needs a reference as it is a ‘type’ of animal.
http://example.org/type/cat.
I’m using a bit of structure in my URLs to help me organise them, but this is not required - they just need to be unique. But it helps everyone trying to manage the data if order is introduced.
Predicate references
The predicate part of a triple is also an entity, so it should be referenced as a triple. The concept of ‘lives with’ can have a URL of http://example.org/property/lives-with.
The concept of ‘has legs’ needs a reference also, so it can have a URL of http://example.org/property/has-legs.
References and literals
References are always URLs. There is another form of data in the graph: the literal. This is used for strings and numbers. When describing that Fred has 4 legs, the number 4 is going to appear in a statement as a literal.
As we said, triples are three parts:
- The subject is always a URL.
- The predicate is always a URL.
- The object can be a URL or a literal.
Some triples
So, consolidating this, our 3 facts look like this:
| Subject | Predicate | Object |
|---|---|---|
http://example.org/animal/fred | http://example.org/property/lives-with | http://example.org/animal/hope |
http://example.org/animal/fred | ??? | http://example.org/type/cat |
http://example.org/animal/fred | http://example.org/property/has-legs | 4 |
There’s a predicate we didn’t define, and that’s “is a”. As in the fact “Fred is a cat”. It links an entity to its type. As that’s such a fundamental concept in RDF, there’s an existing predicate defined by the W3C in the RDF standard, which is http://www.w3.org/1999/02/22-rdf-syntax-ns#type.
| Subject | Predicate | Object |
|---|---|---|
http://example.org/animal/fred | http://example.org/property/lives-with | http://example.org/animal/hope |
http://example.org/animal/fred | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://example.org/type/cat |
http://example.org/animal/fred | http://example.org/property/has-legs | `4 |
The prefix
Before going any further, the syntax is starting to get a little verbose and repetitive. So, prefixes are used to replace long repeated part of the URL. e.g.
| Prefix | URL part |
|---|---|
animal | http://example.org.animal/ |
rdf | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
prop | http://example.org/property/ |
type | http://example.org/type/ |
Using the prefixes, our fact table looks like this:
| Subject | Predicate | Object |
|---|---|---|
animal:fred | prop:lives-with | animal:hope |
animal:fred | rdf:type | type:cat |
animal:fred | prop:has-legs | 4 |
This is a much more compact and readable representation of the data.
Prefixes are just a short-hand, animal:fred and http://example.org.animal/fred mean the same thing.
Defining types
We already included some entities which aren’t completely defined e.g. prop:has-legs. This has meaning but our graph doesn’t explain that meaning. In classic database design this would mean defining a schema. In knowledge graphs, that ‘schema’ information can just go in the graph using RDF Schema standard. The rdf:type predicate has already been introduced. For entities that are predicates or types, they can be associated with RDF Schema entities to specify that they are classes or properties. Now we need a new prefix:
| Prefix | URL part |
|---|---|
rdfs | http://www.w3.org/2000/01/rdf-schema# |
The entities rdfs:Class and rdfs:Property are used for types and predicates respectively. So now we have:
| Subject | Predicate | Object |
|---|---|---|
prop:lives-with | rdf:type | rdfs:Property |
prop:has-legs | rdf:type | rdfs:Property |
type:cat | rdf:type | rdfs:Class |
Types can point to other derived types. So, if it was of interest, the types around the type:cat object could be elaborated. A type:cat could be a type of type:mammal, a type:mammal a type of type:animal, and type:animal of type rdfs:Class. For this dataset, I’m not particularly interested in animal classification, so will skip that complexity.
Labels
The rdfs:label predicate is used to associate labels with any entity. That includes entities, predicates and types we have defined.
| Subject | Predicate | Object |
|---|---|---|
type:cat | rdfs:label | cat |
animal:fred | rdfs:label | Fred |
animal:hope | rdfs:label | Hope |
prop:lives-with | rdfs:label | lives with |
prop:has-legs | rdfs:label | has legs |
Other useful predicates
There are two other useful predicates which will be introduced now:
http://dbpedia.org/ontology/thumbnaillinks an entity to a visual small thumbnail image. The object should be the URL of an image.http://purl.org/dc/elements/1.1/relationlinks an entity to a web page or further information. The object should be a URL.
Putting it all together
| Subject | Predicate | Object |
|---|---|---|
animal:fred | rdfs:label | Fred |
animal:fred | rdf:type | type:cat |
animal:fred | prop:lives-with | animal:hope |
animal:fred | prop:has-legs | 4 |
animal:hope | rdfs:label | Hope |
animal:hope | rdf:type | type:cat |
animal:hope | prop:lives-with | animal:fred |
animal:hope | prop:has-legs | 4 |
prop:lives-with | rdf:type | rdfs:Property |
prop:lives-with | rdfs:label | lives with |
prop:has-legs | rdf:type | rdfs:Property |
prop:has-legs | rdfs:label | has legs |
type:cat | rdf:type | rdfs:Class |
type:cat | rdfs:label | cat |
N-Triples
The simplest file format for RDF triples is N-Triples. Each line of the file describes a triple. The S, P, O elements are written out, space-separated. Each element is surrounded either by angle-brackets for a URI, or double-quotes for literals. Here’s a snippet:
<http://example.org.animal/fred> <http://example.org/property/has-legs> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example.org.animal/fred> <http://example.org/property/lives-with> <http://example.org.animal/hope> .
<http://example.org.animal/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/type/cat> .
<http://example.org.animal/fred> <http://www.w3.org/2000/01/rdf-schema#label> "Fred" .
<http://example.org.animal/hope> <http://example.org/property/has-legs> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://example.org.animal/hope> <http://example.org/property/lives-with> <http://example.org.animal/fred> .
<http://example.org.animal/hope> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/type/cat> .
<http://example.org.animal/hope> <http://www.w3.org/2000/01/rdf-schema#label> "Hope" .
<http://example.org/property/has-legs> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Property> .
<http://example.org/property/has-legs> <http://www.w3.org/2000/01/rdf-schema#label> "has legs" .
<http://example.org/property/lives-with> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Property> .
<http://example.org/property/lives-with> <http://www.w3.org/2000/01/rdf-schema#label> "lives with" .
<http://example.org/type/cat> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<http://example.org/type/cat> <http://www.w3.org/2000/01/rdf-schema#label> "cat" .
Turtle
N-Triples are very verbose, and not so easy to edit. There’s a lot of repitition. Another format is Turtle format. Turtle allows prefixes to be defined and used. The format also allows grouping so that the entity strings don’t need to be repeated.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix animal: <http://example.org/animal/> .
@prefix prop: <http://example.org/property/> .
@prefix type: <http://example.org/type/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
animal:fred
prop:has-legs 4 ;
prop:lives-with animal:hope ;
a type:cat ;
rdfs:label "Fred" .
animal:hope
prop:has-legs 4 ;
prop:lives-with animal:fred ;
a type:cat ;
rdfs:label "Hope" .
prop:has-legs
a rdfs:Property ;
rdfs:label "has legs" .
prop:lives-with
a rdfs:Property ;
rdfs:label "lives with" .
type:cat
a rdfs:Class ;
rdfs:label "cat" .
TrustGraph’s graph store
TrustGraph’s graph store extends beyond basic triples with two additional capabilities:
Named graphs (quads)
TrustGraph supports named graphs, which means that every triple is associated with a graph identifier. This makes each statement a quad: subject, predicate, object, and graph. Named graphs allow triples to be organised into distinct collections - for example, separating knowledge graph content from retrieval provenance data, or isolating data by user and collection.
Reification
TrustGraph supports RDF reification, which allows edges to point at other edges. In RDF terms, a triple can itself be treated as a resource and become the subject or object of further triples. This is used extensively in TrustGraph for provenance tracking - for instance, recording which source document a particular knowledge graph edge was derived from, or attaching confidence scores and extraction metadata to individual facts.
Knowledge graph vs. context graph
Knowledge graph
A structured representation of facts about a domain - entities and the relationships between them. It captures what is known.
Context graph
A knowledge graph enriched with the context surrounding that knowledge - where it came from (provenance) and how it has been interpreted and reasoned about (explainability). A context graph recognises that a fact in isolation is less valuable than a fact you can trace back to its source and whose role in previous reasoning you can examine. Crucially, because provenance and explainability are represented as graph data alongside the knowledge itself, they become subjects of further reasoning - you can ask questions not just about what you know, but about how you know it.
The distinction: a knowledge graph captures knowledge; a context graph captures knowledge in context.
Going further
The RDF technologies and standards are big and very powerful for knowledge management, and we only scraped the surface here. But this is enough to get started with our knowledge graph.