In an era defined by data, making sense of complex relationships and vast information repositories has become critical for organizations. are emerging as a powerful tool to tackle these challenges. By organizing data into a network of interconnected entities and relationships, knowledge graphs provide a structured way to represent information and extract insights.

In traditional data systems, information is often stored in siloed databases, making it difficult to extract meaningful insights. For instance:

Disconnected Data: Systems that store data independently struggle to provide a unified view.
Poor Relationship Representation: Conventional databases often fall short in modeling complex interconnections between entities.
Search Limitations: Searching across structured and unstructured data seamlessly is challenging.

In his presentation

, Mike Bergman defines the nature of the world as messy, complicated, interconnected, diverse and everychanging. As a result, our knowledge of this world is never complete, exists in structured, semi-structured and unstructured formats and can be found everywhere. This knowledge is contextual and MUST be coherent! This is the world we live in and the world we are trying to model with Knowledge Graphs.

A knowledge graph addresses these problems by integrating data from various sources into a unified graph structure. It represents as (nodes) that connect to each other through various relationships (edges) in a format that is both human-readable and machine-interpretable. This framework acts as a database, enabling complex queries by understanding the context and connections between various pieces of information. Knowledge graphs enhance AI applications by improving information retrieval and reasoning capabilities across multiple data sources powering tasks like semantic search, recommendation systems, and more.

What is a Knowledge Graph (KG)?

The term knowledge graph has been used frequently in research and business, usually in close association with Semantic Web technologies, linked data, large-scale data analytics and cloud computing. The term "knowledge graph" is often mistakenly thought to have originated in 2012, when Google adopted it to describe its structured entity-attribute information, prominently featured on its search results pages. While Google's use of the term has significantly boosted its visibility and marketing appeal, the concept dates back much further. The phrase "knowledge graph" itself can be traced to the 1970s, and the underlying ideas go back even earlier.

There has been lots of efforts to clearly define but to put it simply; a knowledge graph is a network-based representation of knowledge that organizes data from multiple sources and captures information about entities of interest and the relationships between them. They are:

Graphs: unlike knowledge bases, the content of KGs is organised as a graph, where nodes (entities of interest and their types), relationships between and attributes of the nodes are equally important. This makes it easy to integrate new datasets and formats and supports exploration by navigating from one part of the graph to the other through links.
Semantic: the meaning of the data is encoded for programmatic use in an ontology, which describes the types of entities in the graph and their characteristics and can be represented as a schema sub-graph. This means that the graph is both a place to organise and store data, and to reason what it is about and derive new information.

Knowledge Graphs consists of:

Nodes: Representing entities like people, places, things, or abstract concepts
Edges: Connections between nodes showing relationships
Labels: Attributes that define the relationships and reasoning rules

At its core, a knowledge graph is data structure that connects data in a semantic way, allowing both humans and machines to understand the context and meaning of the information. Some live examples of Knowledge Graphs are the that powers search results with contextual insights and that models professional connections and job market trends.

Extended Intro on Knowledge Graphs from the

Although the term “knowledge graph” has appeared in academic literature since at least 1972 [1], its modern usage gained prominence following Google's 2012 announcement of the Google Knowledge Graph [2]. This was soon followed by similar announcements from other major companies, including Airbnb [1], Amazon [2], eBay [3], Facebook [3], IBM [2], LinkedIn [3], Microsoft [8], and Uber [3]. The increasing adoption of knowledge graphs in industry has sparked a surge of academic interest, resulting in a growing body of scientific literature on the topic. This includes books (e.g., [7] [9] [3] [6]), papers defining the concept (e.g., [3]), innovative methodologies (e.g., [12] [17] [9]), and surveys focusing on specific aspects of knowledge graphs (e.g., [12] [19]).

Central to these developments is the fundamental principle of representing data as graphs, often augmented with explicit mechanisms to encode knowledge [10]. This approach is commonly used in applications that require the integration, management, and extraction of value from large-scale, heterogeneous data sources [10]. Graph-based knowledge representation offers several advantages compared to relational databases or NoSQL alternatives. Graphs provide an intuitive and compact abstraction for modeling diverse domains, with edges naturally capturing potentially cyclical relationships inherent in areas such as social networks, biological systems, bibliographic citations, transport networks, and more [1]. They also enable schema flexibility, allowing data and its scope to evolve dynamically, which is particularly valuable for handling incomplete knowledge [1]. Unlike other NoSQL approaches, graph-specific query languages support not only traditional relational operations (e.g., joins, unions, projections) but also navigational queries to discover entities linked by paths of arbitrary lengths [2].

Additionally, standard knowledge representation frameworks—such as ontologies [10] [4] [15] and rule-based systems [12] [14]—can define and reason about the semantics of the nodes and edges in the graph. Scalable frameworks for graph analytics [17] [30] [28] enable tasks like computing centrality, clustering, and summarization to derive insights about the domain. Furthermore, specialized graph representations have been developed to facilitate the application of machine learning techniques, both directly and indirectly, on graph data [29] [31].

Definition	Source
"A knowledge graph (i) mainly describes real-world entities and their interrelations, organized in a graph, (ii) defines possible classes and relations of entities in a schema, (iii) allows for potentially interrelating arbitrary entities with each other and (iv) covers various topical domains."	Paulheim [22]
"Knowledge graphs are large networks of entities, their semantic types, properties, and relationships between entities."	Journal of Web Semantics [9]
"Knowledge graphs could be envisaged as a network of all kinds of things which are relevant to a specific domain or to an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets."	Semantic Web Company [4]
"We define a Knowledge Graph as an graph. An graph consists of a set of triples where each triple $(s, p, o)$ is an ordered set of the following terms: a subject $s \in U \cup B$ , a predicate $p \in U$ , and an object $o \in U \cup B \cup L$ . An term is either a URI $u \in U$ , a blank node $b \in B$ , or a literal $l \in L$ ."	Färber et al. [9]
"[...] systems exist, [...], which use a variety of techniques to extract new knowledge, in the form of facts, from the web. These facts are interrelated, and hence, recently this extracted knowledge has been referred to as a knowledge graph."	Pujara et al. [28]

Table 1: Selected definitions of knowledge graph - Towards a Definition of Knowledge Graphs [9]

Different Types of Information Management Systems

To appreciate the uniqueness of knowledge graphs, it’s helpful to understand how they compare to other :

: stores data in a row-based table structure which connects related data elements An RDBMS includes functions that maintain the security, accuracy, integrity and consistency of the data.
: de-normalised data stores that allow for analytical activities like count, aggregation, etc.
: data storage paradigm designed for storing, retrieving, and managing associative arrays , and a data structure more commonly known today as a dictionary or hash table.
: stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns (by eliminating the need to read columns that are not relevant), and more options for data compression
: uses graph structures for semantic queries with nodes, edges, and properties to represent and store data
: data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data
: optimized for handling time series data, i.e., data points indexed in time order
: supports multiple data models against a single, integrated backend

Under the hood

The key difference between a graph and relational database is that relational databases work with sets while graph databases work with paths. This manifests itself in unexpected and unhelpful ways for a Relational Database Management System (RDBMS) user.

For example when trying to emulate path operations (e.g. friends of friends) by recursively joining in a relational database, query latency grows unpredictably and massively as does memory usage, not to mention that it tortures SQL to express those kinds of operations. More data means slower in a set-based database, even if you can delay the pain through judicious indexing.

Most graph databases don't suffer this kind of join pain because they express relationships at a fundamental level. That is, relationships physically exist on disk and they are named, directed, and can be themselves decorated with properties (the property graph model). This means if you chose to, you could look at the relationships on disk and see how they "join" entities. Relationships are therefore first-class entities in a graph database and are semantically far stronger than those implied relationships reified at runtime in a relational store.

tldr;

Graph databases are much faster than relational databases for connected data - a strength of the underlying model. A consequence of this is that query latency in a graph database is proportional to how much of the graph you choose to explore in a query, and is not proportional to the amount of data stored, thus defusing the .
Graph databases make modelling and querying much more pleasant meaning faster development

How to determine if Knowledge Graphs are what you need?

1. Is your Data Highly-Connected?

Graph solutions are focused on highly-connected data that comes with an intrinsic need for relationship analysis. If the connections within the data are not the primary focus and the data is of a transactional nature, then a graph database is probably not the best fit.

2. Is Retrieving the Data more Important than Storing it?

Graph databases are optimized for data retrieval and you should go with the graph database if you intend to retrieve data often. If your focus is on writing to the database and you’re not concerned with analyzing the data, then a graph database wouldn’t be an appropriate solution. A good rule of thumb is, if you don’t intend to use JOIN operations in your queries, then a graph is not a must-have.

3. Does your Data Model Change Often?

If your data model is inconsistent and demands frequent changes, then using a graph database might be the way to go. Because graph databases are more about the data itself than the schema structure, they allow a degree of flexibility.

On the other hand, there are often benefits in having a predefined and consistent table that’s easy to understand. Developers are comfortable and used to relational databases and that fact cannot be downplayed.

For example, if you are storing personal information such as names, dates of birth, locations… and don’t expect many new fields or a change in data types, relational databases are the go-to solution. On the other hand, a graph database could be useful if:

Additional attributes could be added at some point,
Not all entities will have all the attributes in the table and
The attribute types are not strictly defined.

Graphs as data structures

A Graph is a non-linear data structure consisting of vertices and edges. The vertices are sometimes also referred to as nodes and the edges are lines or arcs that connect any two nodes in the graph. More formally, a knowledge graph as a directed labeled graph is a 4-tuple $G = (N, E, L, f)$ , where $N$ is a set of nodes, $E \subseteq N \times N$ is a set of edges, $L$ is a set of labels, and $f: E \to L$ is an assignment function from edges to labels. An assignment of a label $B$ to an edge $E = (A, C)$ can be viewed as a triple $(A, B, C)$ .

Types of Graphs

Null Graph: A graph is known as a null graph if there are no edges in the graph
Trivial Graph: Graph having only a single vertex, it is also the smallest graph possible
Undirected Graph: A graph in which edges do not have any direction. That is the nodes are unordered pairs in the definition of every edge.
Directed Graph: A graph in which edge has direction. That is the nodes are ordered pairs in the definition of every edge.
Labeled Graph: A graph where edges are labelled (can have properties for the relationships.
Connected Graph: The graph in which from one node we can visit any other node in the graph is known as a connected graph.
Disconnected Graph: The graph in which at least one node is not reachable from a node is known as a disconnected graph.
Regular Graph: The graph in which the degree of every vertex is equal to K is called K regular graph.
Complete Graph: The graph in which from each node there is an edge to each other node
Cycle Graph: The graph in which the graph is a cycle in itself, the degree of each vertex is 2.
Cyclic Graphs: A graph containing at least one cycle is known as a Cyclic graph.
Directed Acyclic Graph: A Directed Graph that does not contain any cycle.
Bipartite Graph: A graph in which vertex can be divided into two sets such that vertex in each set does not contain any edge between them.
Weighted Graph: A graph in which the edges are already specified with suitable weight is known as a weighted graph. Weighted graphs can be further classified as directed weighted graphs and undirected weighted graphs.

you can have a mix of types between those types e.g., Directed Cyclic Graphs, Directed Labeled Cyclic Graphs, Directed Labeled Cyclic Multigraph, etc.

Graphs as data models

Directed edge-labelled graphs

A directed edge-labelled graph (or multi-relational graph [4, 6, 25]) is a set of nodes connected by directed, labelled edges. In knowledge graphs, nodes represent entities, and edges represent relationships between them.

Key Features

Flexible Data Representation: Graphs allow for integrating new data sources more flexibly than relational databases, which require predefined schemas. Unlike hierarchical data models (e.g., XML, JSON), graphs allow cycles and avoid rigid hierarchical structuring.
Bidirectional Edges: For clarity, bidirectional edges can represent two directed edges.
Incomplete Data: Missing information can simply be omitted, such as when the graph lacks start/end dates for an event.

A standardised data model based on directed edge-labelled graphs is the Resource Description Framework (RDF) which has been recommended by the for representing knowledge graphs on the web. The model defines different types of nodes, including which allow for global identification of entities on the Web; literals, which allow for representing strings (with or without language tags) and other datatype values (integers, dates, etc.); and blank nodes, which are anonymous nodes that are not assigned an identifier.

Everything in an graph is called a resource. “Edge” and “Node” are just the roles played by a resource in a given statement. Fundamentally in , there is no difference between resources playing an edge role and resources playing a node role. An edge in one statement can be a node in another. We will give examples of this in the diagrams that follow that will make this core idea clearer.

There is a standard query language for Graphs called . It is both, a full featured query language and an HTTP protocol making it possible to send query requests to endpoints over HTTP. A key part of the standard is the definition of serializations. The most commonly used serialization format is called Turtle. There is also a JSON serialization called as well as an XML serialization. All databases are able to export and import graph content in standard serializations making it easy and seamless to interchange data.

A directed edge-labelled graph is a tuple

G = (V, E, L)

, where

V \subseteq \text{Con}

is a set of nodes,

L \subseteq \text{Con}

is a set of edge labels, and

E \subseteq V \times L \times V

is a set of edges.

Heterogeneous Graphs

A heterogeneous graph [19, 40, 44] (or heterogeneous information network [39, 40]) is a directed graph where each node and edge is assigned one type. Heterogeneous graphs are similar to directed edge-labelled graphs, with edge labels corresponding to edge types, but they also include node types as part of the graph model.

An edge is called homogeneous if it connects two nodes of the same type and heterogeneous if it connects nodes of different types. Heterogeneous graphs allow partitioning nodes by their type, which is useful for machine learning tasks [19] [42] [46].

In contrast, directed edge-labelled graphs support a more flexible model where nodes can have zero or multiple types.

A heterogeneous graph is a tuple

G = (V, E, L, l)

, where

V \subseteq \text{Con}

is a set of nodes,

L \subseteq \text{Con}

is a set of edge/node labels,

E \subseteq V \times L \times V

is a set of edges, and

l : V \rightarrow L

maps each node to a label.

Property Graphs

While there are core commonalities in property graph implementations, there is no true standard property graph data model. Each implementation of a Property Graph is, therefore, somewhat different. The following discusses the characteristics that are common for any property graph database.

Generally, the property graph data model consists of three elements:

Nodes: The entities in the graph. Nodes can be tagged with zero to many text labels representing their type. Nodes are also called vertices.
Edges: The directed links between nodes. Edges are also called relationships. The “from node” of a relationship is called the source node. The “to node” is called the target node. Each edge has a type. While edges are directed, they can be navigated and queried in either direction.
Properties: The key-value pairs associated with a node or with an edge.

Property values can have data types. Supported data types depend on the vendor. For example, Neo4j data types are similar, but not identical, to Java language data types.

A property graph is a tuple

G = (V, E, L, P, U, e, l, p)

, where:

V \subseteq \text{Con}

is a set of node IDs,

E \subseteq \text{Con}

is a set of edge IDs

L \subseteq \text{Con}

is a set of labels,

P \subseteq \text{Con}

is a set of properties,

U \subseteq \text{Con}

is a set of values,

e : E \rightarrow V \times V

maps an edge ID to a pair of node IDs,

l : V \cup E \rightarrow 2^L

maps a node or edge ID to a set of labels,

p : V \cup E \rightarrow 2^{P \times U}

maps a node or edge ID to a set of property–value pairs.

A key part of any data model is having a query language available for working with it. After all, users need to have a way to access and manipulate the data in the graph. No industry standard query language exists for property graphs. Instead, each database offers their own, unique query language that is incompatible with others:

offers also known as CQL—its own query language that, to some extent, took SQL as an inspiration.
offers GSQL—its own query language that also took SQL as an inspiration.
MS SQL Graph has their own extension to SQL to support graph query.
Some vendors, in addition to their own query language, also implement some subset of Cypher. For example, SAP Hana offers its own extensions to SQL and its own GraphScript language plus they support a subset of Cypher.

There is also ; which is an open source graph computing framework that is integrated with some property graph and graph databases. It offers the Gremlin language which is more of an API language than a query language.

A key requirement for working with any data model is the ability to reference nodes, properties and relationships (edges). In the case of property graphs, internally, nodes and edges have IDs. IDs are assigned by a database and are internal to a database. Referencing is done by using text strings—node labels, relationship types, and property names.

vs. Property Graph

Feature		Property Graph
Expressivity	Arbitrary complex descriptions via links to other nodes; no properties on edges out of the box. With the model gets much more expressive than property graphs	Limited expressivity; beyond the basic directed cyclic labeled graph properties (KV pairs) for nodes and edge
Formal Semantics	✅ standards schema and model semantics foster reuse and inference	❌ No formal model representation
Standardisation	Driven by W3C working groups and standardisation processes	Different competing vendors
Query Language	W3C standard	Cypher, PGQLm GCORE, GQL → no standard
Serialisation Formant	✅ Multiple serialisation formats	❌ No serialisation format
Schema Language	✅ , , Shapes	❌ None
Design goal	Linked Data (publishing and linking data with formal semantic and no central control)	Graph representation for analytics
Processing Strengths	Set analysis operations (as in SQL but with schema abstraction and flexibility)	Graph Traversal (plenty of graph analytics and ML Libs)

tldr; The main advantages of RDF

The Data Model provides a richer, semantically consistent foundation over property graphs.
Text values can also have language tags to support internationalisation of data. For example, instead of a single value for rdfs:label for New York City we could have multiple values such as:
“New York City” xsd:string @en
“Nueva York” xsd:string @sp
A key differentiator is how the underlying model (schema) is represented in the same way as the data. Just to serve as a primer, rdf:type is a predicate used to connect a resource with a class it belongs to; rdfs:label is used to provide a display name for a resource. The uniformity of the data model makes Graphs more easily evolvable and gives them more flexibility compared to Property Graphs.
Enrichment Through Composition: With the inherent composability of Graphs, when two nodes have the same URI, they are automatically merged. This means that you can load different files and their content will be joined together forming a larger and more interesting graph.
Having data in standard format allows for the ease of integration with the wealth of Open Data available e.g.m DBpedia, Geonames, Open Corporates, etc.
No vendor lock in, its all open source and W3C Standards

So, in the end, what is a Knowledge Graph?

A Knowledge Graph is a connected data structure of data and associated metadata applied to model, integrate and access information assets. The knowledge graph represents real-world entities, facts, concepts, and events as well as the relationships between them. Knowledge graphs yield a more accurate and comprehensive representation of data.

Knowledge Graphs (KGs) have emerged as a compelling abstraction for organising the world’s structured knowledge, and as a way to integrate information extracted from multiple data sources. Knowledge graphs have started to play a central role in representing the information extracted using natural language processing and computer vision. Domain knowledge expressed in KGs is being input into machine learning models to produce better predictions.

The heart of the knowledge graph is a knowledge model – a collection of interlinked descriptions of concepts, entities, relationships and events where:

Descriptions have formal semantics that allow both people and computers to process them in an efficient and unambiguous manner;
Descriptions contribute to one another, forming a network, where each entity represents part of the description of the entities related to it;
Diverse data is connected and described by semantic metadata according to the knowledge model.

Knowledge graphs combine characteristics of several data management paradigms:

Database, because the data can be explored via structured queries;
Graph, because they can be analysed as any other network data structure;
Knowledge base, because they bear formal semantics, which can be used to interpret the data and infer new facts.

Knowledge graphs have a number of benefits over conventional relational databases and document stores. Specifically:

A unified, single source of truth
Flexible and highly adaptable data structure
Can represent knowledge in any domain
Wide range of tooling for data model definition and control
Ability to link and enrich data
Huge open source library of linked data
The perfect playground for virtually all ML tasks

Knowledge graphs, represented in RDF, provide the best framework for data integration, unification, linking and reuse, because they combine:

Expressivity: The standards in the Semantic Web stack – , and – allow for a fluent representation of various types of data and content: data schema, taxonomies and vocabularies, all sorts of metadata, reference and master data. The extension makes it easy to model provenance and other structured metadata.
Performance: All the specifications have been thought out, and proven in practice, to allow for efficient management of graphs of billions of facts and properties.
Interoperability: There is a range of specifications for data serialization, access ( Protocol for end-points), management ( Graph Store) and federation. The use of globally unique identifiers facilitates data integration and publishing.
Standardization: All the above is standardized through the W3C community process, to make sure that the requirements of different actors are satisfied – all the way from logicians to enterprise data management professionals and system operations teams.

What is NOT a Knowledge Graph?

Not every graph is a knowledge graph. For instance, a set of statistical data, e.g. the GDP data for countries, represented in is not a KG. A graph representation of data is often useful, but it might be unnecessary to capture the semantic knowledge of the data. It might be sufficient for an application to just have a string ‘Italy’ associated with the string ‘GDP’ and a number ‘1.95 trillion’ without needing to define what countries are or what the ‘Gross Domestic Product’ of a country is. It’s the connections and the graph that make the KG, not the language used to represent the data.

Not every knowledge base is a knowledge graph. A key feature of a KG is that entity descriptions should be interlinked to one another. The definition of one entity includes another entity. This linking is how the graph forms. (e.g. A is B. B is C. C has D. A has D). Knowledge bases without formal structure and semantics, e.g. Q&A “knowledge base” about a software product, also do not represent a KG. It is possible to have an expert system that has a collection of data organized in a format that is not a graph but uses automated deductive processes such as a set of ‘if-then’ rules to facilitate analysis.

Why Knowledge Graphs are very exciting for ML?

Bringing knowledge graphs and machine learning together will systematically improve the accuracy of the systems and extend the range of machine learning capabilities. We are particularly interested in their applications in:

Data Insufficiency

Having a sufficient amount of data to train a machine learning model is very important. In the case of sparse data, Knowledge Graph can be used to augment the training data, e.g., replacing the entity name from original training data with an entity name of a similar type. This way a huge number of both positive and negative examples can be created using Knowledge Graph.

Zero-Shot Learning

Today, the main challenge with a Machine Learning model is that without a properly trained data it can not distinguish between two data points. In Machine Learning, this is considered as Zero-Shot Learning problem. This is where knowledge graphs can play a very big role. The induction from the Machine Learning model can be complemented with a deduction from the Knowledge Graph, e.g., where the type of situation did not appear in the training data.

Explainability

One of the major problems in machine learning industry is explaining the predictions made by machine learning systems. One issue is the implicit representations causing the predictions from the machine learning models. Knowledge Graph can alleviate this problem by mapping the explanations to some proper nodes in the graph and summarizing the decision-taking process.

Appendix

Graph DBMS vendors

Vendor Product	Native or Multimodel	Supported Models	Deployment Platforms	Query Language	Supported Graph Algorithms/Libraries	License Model	Pricing Model (Nodes, Users, Consumption)
	Native	Property,	Cloud	TinkerPop, Gremlin,	TinkerPop	Open source, managed service	On-demand instances, storage, I/O, backups, data transfer
	Native	, (property)	On-premises, multicloud, hybrid	OpenCypher,	Built-in	Freemium, subscription	vCPU cores
and	Multimodel	Property	On-premises, multicloud	TinkerPop, Gremlin, GraphQL	TinkerPop	Open core	Nodes and consumption
	Native	GraphQL, JSON,	On-premises, cloud	DQL, GraphQL	Built-in	Open source, Apache 2.0	CPUs per node
	Multimodel	Document, graph (JSON-LD), , ,	On-premises, multicloud, hybrid	SPARQL, SPARQL*, FedShard-Parallel, SPARQL, GraphQL, Prolog/Datalog, Lisp, JIG/Gremlin, domain-specific languages	Built-in	Closed source	CPU cores
	Multimodel	Triples	On-premises, multicloud, hybrid	JavaScript, Optic, Search, SPARQL, XQuery	Built-in	Closed source (free developer version)	Cores, consumption (free developer version)
	Multimodel	Property	Cloud	Gremlin	Built-in	Open source, Apache 2.0	Throughput capacity, serverless consumption
	Native	Property	On-premises, multicloud, hybrid	Cypher (openCypher), GraphQL, RDF/SPARQL, SQL	Built-in	Open core, managed service	SaaS: RAM, consumption; on-premises: machines/cores/RAM
	Native	, OWL/	On-premises, multicloud	GraphQL, SPARQL, SPARQL*,SQL	Built-in	Perpetual, subscription, limited free version	per-CPU
	Multimodel		On-premises, multicloud, hybrid	SPARQL, SQL	Built-in	Closed source	Concurrent users and CPU affinity, per node
	Multimodel	Property,	On-premises, multicloud, hybrid	PGQL,	Built-in	Perpetual, subscription	Perpetual: user/server/enterprise; subscription: consumption
	Multimodel	Property	On-premises, multicloud, hybrid	Cypher	LAGraph	Perpetual, subscription	Consumption (RGUs based on memory and throughput)
	Multimodel	Property		GraphScript (proprietary), openCypher, SQL, SQLScript	Built-in	Perpetual, subscription	Perpetual: users; subscription: consumption
	Native	,	On-premises, multicloud, hybrid	GraphQL, SPARQL, SQL	Built-in	Subscription	Nodes, consumption
	Native	Property	On-premises, multicloud, hybrid	Gremlin	Built-in	Freemium, perpetual, subscription	On-premises: cores, connection; subscription: consumption
	Native	Labeled property	On-premises, multicloud	GSQL	Built-in	Freemium, perpetual, subscription	On-premises: available RAM for data storage; cloud: vcPU, RAM, disk size, I/O

Vendor Profiles

Amazon Web Services (AWS)

, based in Seattle, introduced in 2018 as a cloud-only managed service and claims thousands of active customers. Neptune is a native graph DBMS supporting and the W3C’s models. ACID transactions, in-memory execution, up to 15 read replicas and high availability are part of the service. Instances are priced by the hour and billed per second with no long-term commitments, and with added charges for storage, IOs, backup, and data transfer in and out. The service supports graphs of up to 64TB, encryption-at-rest with customer-managed keys and cross-region snapshot sharing.

Neptune can be queried with W3C’s as well as to build graph applications and implement custom graph algorithms. Open-source tools are available on Github under Apache 2.0 and MIT licenses. AWS is an active contributor to the open-source project.

The , an open-source Apache2 Jupyter Notebook developed by AWS, allows customers to visualize the results of queries run against the graph database and to get started with sample graph applications. supports ML to make predictions over graph data using graph neural networks (GNNs) from the .

Cambridge Semantics

is based in Boston and has offered , a knowledge graph platform that includes the , since 2015 with freemium and enterprise software subscriptions on-premises, and via Cloud Marketplaces in multicloud and hybrid deployment. Pricing for both the platform and engine is based on the number of vCPU cores used by the graph engine.

version 2.2 is a native graph engine supporting and for property graph use cases. Inferencing is performed for and OWL 2 RL ontologies using in-memory materialization of triples. It utilizes an MPP OLAP engine serving use cases where calculations and analytics need to be performed across the whole of an knowledge graph. It also supports querying via and . A library of analytics functions is provided with the product, and ML frameworks supported include , , , , and .

The Anzo Knowledge Graph Platform adds capabilities for knowledge graph management, metadata management, visual schema and query design tools, data ingestion, and integration with analytics and business intelligence (BI) tools.

DataStax

is based in Santa Clara, California, and added graph capabilities to in 2015. The capabilities are also available in DataStax’s cloud DBMS offering, . A nonrelational multimodel DBMS, it supports property graphs using and for a query language. With its history as the leading commercializer of , DataStax offers an open-core version of that DBMS, with added features in the enterprise version including graph support. Version 6.8 offers graph data models implemented natively within Cassandra.

supports developers writing queries in Cassandra Query Language (CQL), Graph/Gremlin, and Spark SQL language, and offers a collaborative notebook interface as well as visual exploration of DSE graphs without requiring Gremlin skills. DataStax Enterprise also offers support for both Gremlin algorithms and , which integrate with . This enables the multimodel aspect of the product to support combinations of technologies across a variety of data collections for analytics use cases and ML. Given Cassandra’s broad adoption for operational use cases, this provides DataStax market differentiation.

Dgraph

is based in Palo Alto, California, and is a recent entrant to the graph DBMS market with the Dgraph product in 2016. The platform is entirely written in the Go language with features aimed at the application development community, which is increasingly using as an API layer on top of graph data structures, and enterprise customers that use GraphQL as a layer to collect data from multiple back ends and present a unified service or API. There is a community edition available for download, together with hosted public and private cloud solutions in AWS, , and with pricing based on CPU nodes.

Dgraph is a native graph DBMS that handles JSON data as well as triples. Querying the database, however, is done using the and . The platform is designed for real-time transactional workloads utilizing relationship-based sharding to optimize queries and traversals across distributed clusters. Dgraph is optimized for transactional workloads. Its native support for graph algorithms includes recursive traversal of graph and k-shortest path algorithms. Dgraph plans to support in upcoming releases. The product fits into the GraphQL ecosystem, including tools for querying and visualization.

Franz

is based in Lafayette, California, and entered the graph database market in 2006 with , a multimodel triplestore supporting documents and graphs that implements RDF*, , and . The platform can be deployed in on-premises and cloud environments with pricing based on CPU cores.

An RDFS++ runtime reasoner allows usage of the modeling language and a subset of terms from the Web Ontology Language () at query time. OWL2 RL support and inference are also available using static materialization.

A unique feature of AllegroGraph since its first release is the use of Prolog as a mechanism to extend or customize the model and reasoning capabilities. AllegroGraph can federate queries in parallel across multiple distributed triplestores using its proprietary FedShard technology. AllegroGraph’s Triple Attributes Security uniquely addresses high-security data environments through role-based, cell-level data access.

Graph traversal, graph analytics, graph algorithms, and ML are all included natively within the product or through extensions with third-party libraries, such as and Python libraries, providing an interface from to . Graph visualization and exploratory analysis are supported by , a no-code tool natively integrated into the platform.

MarkLogic

is based in San Carlos, California, and entered the graph DBMS market in 2013 as a document-based multimodel product with a focus on knowledge graph use cases. Unlike most other multimodel offerings, it also has a native triplestore permitting it to optimally store data not already inside documents.

The MarkLogic Semantics capability is built into the core product and sold as a license option. It enables direct querying in and reasoning support for and ontologies. Custom rules can be defined using MarkLogic’s own language, which is based on the Construct operator.

MarkLogic can be deployed on-premises, in the cloud, or in hybrid deployments, and has APIs for SPARQL, JavaScript, XQuery, Search, SQL, REST, Java, Node.js, and Optic queries. It offers a free developer version and pricing by cores and/or consumption. It provides tools focused on data curation and access — queries can be against both data and metadata within the same query, making it well-suited to building data hubs. It also supports and for ML.

Microsoft

, based in Redmond, Washington, provides graph capabilities in three offerings: , SQL Server, and . Azure Cosmos DB is a nonrelational multimodel DBMS deployed as a managed service in the cloud. It provides a property graph model and supports as a query language.

Visual design and query tools from the Gremlin ecosystem, such as and , are recommended. Algorithms are available through Gremlin recipes. Microsoft provides a Spark connector to enable ML via the .

Neo4j

was founded in 2007 and is based in San Mateo, California. It is a native graph store supporting the property graph model. An open-source version of the platform is available, as well as an enterprise version that can be deployed across on-premises and cloud environments. Its managed SaaS service is called . Pricing for self-managed installations is based on the number of machines, cores, and RAM.

Neo4j is the creator of the query language, which has been adopted by other graph databases as . It also supports for graph traversals. Connectors for BI tools, streaming, Spark, and ingestion from various databases and file formats are also available. Neo4j is a member and active participant in the development of the .

Neo4j for Graph Data Science includes over 50 algorithms, graph embeddings, and support for supervised and unsupervised ML. Visualization, schema design, and data source connection tools are also available.

Ontotext

is based in Bulgaria and provides , a native triplestore supporting RDF* and SPARQL*. Reasoning and inference for , OWL Lite, and OWL2 RL/QL are supported via materialization of triples at load time. Virtualization over RDBMS systems is supported, enabling queries against relational databases.

The Ontotext Platform adds capabilities for schema creation, text processing pipelines, and platform deployment via Kubernetes. Developers can use interfaces to overcome the need for queries.

OpenLink Software

offers , a multimodel DBMS supporting both relational and data. Virtuoso enables data virtualization and hosts the , a collection of datasets accessible as a knowledge graph on the web. The platform supports inference on subclass and subproperty constructs, enabling graph traversal.

Oracle

entered the graph DBMS market in 2009 with graph capabilities in its multimodel Oracle DBMS. It supports and . The Oracle DBMS includes a library of graph algorithms and supports ML with frameworks like pandas and NumPy.

Redis Labs

offers , a graph database module for Redis supporting the query language. It integrates with for ML applications.

SAP

supports property graphs with query via , , and SQL. It includes a library of graph algorithms and integrates with .

Stardog

provides the Stardog Enterprise Knowledge Graph Platform, a triplestore supporting RDF* and SPARQL*. It supports in-database ML and virtual graphs to represent data without duplication.

TIBCO Software

offers , supporting property graphs and querying with . It integrates with .

TigerGraph

supports labeled property graphs with its . It features prebuilt schemas and tools for ML and analytics.

[1]

Abiteboul, S. 1997. Querying Semi-Structured Data. Database Theory - ICDT ’97, 6th International Conference, Delphi, Greece, January 8-10, 1997, Proceedings (1997), 1–18.

[2]

Angles, R. et al. 2017. Foundations of Modern Query Languages for Graph Databases. ACM Computing Surveys. 50, 5 (Sep. 2017), 68:1-68:40.

[3]

Angles, R. and Gutierrez, C. 2008. Survey of graph database models. ACM Computing Surveys. 40, 1 (2008), 1:1-1:39.

[4]

Balazevic, I. et al. 2019. Multi-relational Poincaré Graph Embeddings. Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada (Dec. 2019), 4465–4475.

[5]

Blumauer, A. 2016. From Taxonomies over Ontologies to Knowledge Graphs.

[6]

Bordes, A. et al. 2013. Translating Embeddings for Modeling Multi-relational Data. Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States (Dec. 2013), 2787–2795.

[7]

Brickley, D. and Guha, R.V. 2014. RDF Schema 1.1. World Wide Web Consortium.

[8]

Chang, S. 2018. Scaling Knowledge Access and Retrieval at Airbnb. AirBnB Medium Blog.

[9]

Devarajan, D. 2017. Happy Birthday Watson Discovery. IBM Cloud Blog.

[10]

Ehrlinger, L. and Wöß, W. 2016. Towards a Definition of Knowledge Graphs. Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems - SEMANTiCS2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS’16) co-located with the 12th International Conference on Semantic Systems (SEMANTiCS 2016), Leipzig, Germany, September 12-15, 2016 (Sep. 2016).

[11]

Ehrlinger, L. and Wöß, W. 2016. Towards a Definition of Knowledge Graphs. International Conference on Semantic Systems (2016).

[12]

Färber, M. and Rettinger, A. 2015. A Statistical Comparison of Current Knowledge Bases. International Conference on Semantic Systems (2015).

[13]

Fensel, D. et al. 2020. Knowledge Graphs - Methodology, Tools and Selected Use Cases. Springer.

[14]

Goos, G. et al. 2016. The Semantic Web. Lecture Notes in Computer Science. (2016).

[15]

Hamad, F. et al. 2018. Food Discovery with Uber Eats: Building a Query Understanding Engine. Uber Engineering Blog.

[16]

He, Q. et al. 2016. Building The LinkedIn Knowledge Graph. LinkedIn Blog.

[17]

Hitzler, P. et al. 2012. OWL 2 Web Ontology Language Primer (Second Edition). World Wide Web Consortium.

[18]

Horrocks, I. et al. 2004. SWRL: A Semantic Web Rule Language Combining OWL and RuleML.

[19]

Hussein, R. et al. 2018. Are Meta-Paths Necessary?: Revisiting Heterogeneous Graph Embeddings. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, October 22-26, 2018 (Oct. 2018), 437–446.

[20]

Kejriwal, M. et al. eds. 2021. Knowledge Graphs: Fundamentals, Techniques, and Applications. The MIT Press.

[21]

Kifer, M. and Boley, H. 2013. RIF Overview (Second Edition). World Wide Web Consortium.

[22]

Krishnan, A. 2018. Making search easier: How Amazon’s Product Graph is helping customers find products more easily. Amazon Blog.

[23]

Lin, Y. et al. 2015. Learning entity and relation embeddings for knowledge graph completion. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA (Aug. 2015), 2181–2187.

[24]

Malewicz, G. et al. 2010. Pregel: a system for large-scale graph processing. Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010 (Jun. 2010), 135–146.

[25]

Mungall, C. et al. 2012. OBO Flat File Format 1.4 Syntax and Semantics. Editor’s Draft.

[26]

Nickel, M. and Tresp, V. 2013. Tensor factorization for multi-relational learning. Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III (Sep. 2013), 617–621.

[27]

Noy, N. et al. 2019. Industry-scale Knowledge Graphs: Lessons and Challenges. ACM Queue. 17, 2 (Apr. 2019).

[28]

Pan, J.Z. et al. eds. 2017. Exploiting Linked Data and Knowledge Graphs in Large Organisations. Springer.

[29]

Paulheim, H. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web Journal. 8, 3 (2017), 489–508.

[30]

Paulheim, H. 2016. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web. 8, (2016), 489–508.

[31]

Pittman, R.J. et al. 2017. Cracking the Code on Conversational Commerce. eBay Blog.

[32]

Pujara, J. et al. 2013. Knowledge Graph Identification. The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part I (Oct. 2013), 542–557.

[33]

Pujara, J. et al. 2013. Knowledge Graph Identification. International Workshop on the Semantic Web (2013).

[34]

Qi, G. et al. 2021. Knowledge Graph. Springer.

[35]

Schneider, E.W. 1973. Course Modularization Applied: The Interface System and Its Implications For Sequence Control and Data Analysis. Technical Report #HumRRO-PP-10-73. Human Resources Research Organization, Alexandria, VA.

[36]

Shrivastava, S. 2017. Bring rich knowledge of people, places, things and local businesses to your apps. Bing Blogs.

[37]

Singhal, A. 2012. Introducing the Knowledge Graph: things, not strings. Google Blog.

[38]

Stutz, P. et al. 2016. Signal/Collect12. Semantic Web Journal. 7, 2 (2016), 139–166.

[39]

Sun, Y. et al. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment. 4, 11 (2011), 992–1003.

[40]

Sun, Y. and Han, J. 2012. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool.

[41]

Wang, Q. et al. 2017. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Transactions on Knowledge and Data Engineering. 29, 12 (Dec. 2017), 2724–2743.

[42]

Wang, X. et al. 2019. Heterogeneous Graph Attention Network. The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019 (May 2019), 2022–2032.

[43]

Wang, Z. et al. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada (Jul. 2014), 1112–1119.

[44]

Wu, Z. et al. 2019. A Comprehensive Survey on Graph Neural Networks. CoRR. abs/1901.00596, (2019).

[45]

Xin, R.S. et al. 2013. GraphX: a resilient distributed graph system on Spark. First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, co-loated with SIGMOD/PODS 2013, New York, NY, USA, June 24, 2013 (Jun. 2013), 2:1-2:6.

[46]

Yang, L. et al. 2020. Dynamic Heterogeneous Graph Embedding Using Hierarchical Attentions. Advances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14-17, 2020, Proceedings, Part II (Apr. 2020), 425–432.

The opinions and views expressed on this blog are solely my own and do not reflect the opinions, views, or positions of my employer or any affiliated organizations. All content provided on this blog is for informational purposes only