/

Implementing Knowledge Graphs with MongoDB

Implementing Knowledge Graphs with MongoDB

Yes, You Can Use MongoDB as a Graph Database!

This blog covers how the unique relational capabilities of Knowledge Graphs can be implemented using MongoDB.

In our blog What is a Knowledge Graph Anyway?, we covered the concept of a Knowledge Graph and how it is useful for enhancing search applications where contextual relationships between different entities are important in the working dataset. In our experience, Neo4J is a common tool used when working with this kind of graph, however there might be cases where using it is not an option due to custom environment issues or technology restrictions.

MongoDB is a commonly used non-relational database in search environments and often serves as a fundamental component in many projects. There are instances where MongoDB might be the only option available, leading to questions about its suitability for certain tasks. Surprisingly, MongoDB can be effectively employed as a Knowledge Graph engine to simulate the relationships between entities, a capability typically associated with Neo4J. In this blog post, we will illustrate this concept through a simple project. We’ll demonstrate how MongoDB and Neo4J can yield identical results when managing related entities.

Example Data

A graph representing entities such as “Person,” “University,” “Degree,” “City” and “Country” will be used to test different kinds of associations. Relationships between entities in this example will be represented in MongoDB as fields in uppercase (BORN_IN, CHILD, etc.). In Neo4J they are actual relations named with the same format.

Each “Person” has a relationship to other people as SPOUSE or CHILD. They have a “Degree” that relates them to a “University” which relates it to a “City” and finally to a “Country”. Also, the person can be related to their birth city and then to a “Country”. A simple, unified view of these entity relationships would look like this in a graph:

But full graph of the example data set, with all the entities and relationships, is more complex and looks like this:

A disadvantage of using MongoDB for graph data is that there is no easy way to visualize as in these examples, which were generated using the default view in Neo4J.

In the following sections, we are going to provide examples of graph insertion and lookup queries in Neo4J and MongoDB.  We will use common color schemes in JSON files where “blue” is for properties, “green” for string values and “orange” for numbers/bools.

Creating Entity Relationships

When using Neo4J, the insertion query of a new entity would be as follows:

CREATE (p:Person { id: $id, name: $name, born_in: $born_in, degree: $degree, birthday: $birthday, hobbies: $hobbies })

Where each value is given from input data as json:

{'id': 1, 'name': 'John Smith', 'born_in': 'New York City', 'degree': 'Computer Science', 'birthday': '01/15/1985', 'hobbies': ['Photography', 'Cooking', 'Gardening']}

For MongoDB, using the Python library, it would be as simple as using the previous json input data and reference the name of the database (knowledge_graph) and the name of the collection (person) to insert it as:

client["knowledge_graph"]["Person"].insert_one(input_json)

As far as entity relationships, there is no exact matching concept in MongoDB. Relationships would be reflected by adding a field in each record with the value of the relationship.  In graph databases like Neo4J, establishing relationships is more straightforward. For example, in Neo4J a relationship between two people would be inserted as:

MATCH (e1:Person {name:'John Smith'}), (e2:Person {name:'Jane Smith'})
CREATE (e1)-[:SPOUSE]->(e2)

To achieve the same behavior for MongoDB, it would be:

client["knowledge_graph"]["Person"].update_one(
        {"name": "John Smith"},
        {"$set": {"SPOUSE": "Jane Smith"}}

For each of the entities and relationships in the simple example project described above, we have to go through the same insertion and update logic. This will help model all of the needed information in MongoDB (as reflected in the graph image) to do graph searches around the different cases with complex scenarios where multiple relationships are involved.

How Does Implementing Knowledge Graphs in MongoDB Work?

The $graphLookup operation in MongoDB is used to perform graph-like searches on collections that have a hierarchical or interconnected structure. It allows you to recursively traverse a collection, following references between documents, and retrieve related data in a hierarchical manner.

To use $graphLookup, you specify a source collection and a target collection, along with the field paths that establish the relationships between documents. MongoDB performs a depth-first search on the source collection, starting from a specified starting point, and looks for matching documents in the target collection based on the defined relationships. The operation continues recursively until a specified maximum depth is reached or there are no more matches. The result of $graphLookup is a new collection that includes both the original documents and the related documents found during the graph traversal. This allows you to retrieve a complete picture of the hierarchical or interconnected data in a single query.

Example Queries

To compare the difference in how to get the same results in both technologies, three different cases will be analyzed.

1)   Retrieve the birth city and the city population from a “Person” entity filtered by its name.

Neo4J Query:

MATCH (person:Person)-[:BORN_IN]->(city:City)
WHERE person.name = "$INPUT"
RETURN city.name AS cityName, city.population AS cityPopulation

MongoDB Query Equivalence:

[
{
"$match": {
"name": "$INPUT"
}
},{
"$graphLookup": {
"from": "City",
"startWith": "$BORN_IN",
"connectFromField": "BORN_IN",
"connectToField": "name",
"as": "city"
}
},{
"$project": {
"_id": 0,
"cityName": {
"$first": "$city.name"
},
"cityPopulation": {
"$first": "$city.population"
      }
  }
}
]

2)   Retrieve the name of all “Person” entities that studied in a “City”

Neo4J Query:

MATCH (person:Person)
WHERE (person)-[:HAS_DEGREE]->(:Degree)-[:OFFERED_AT]->(:University)-[:IS_IN]->(:City)-[:IS_IN]->(:Country {name: '$INPUT'})
RETURN person.name AS personName

MongoDB Query Equivalence:

[
   {
       "$graphLookup": {
           "from": "Degree",
           "startWith": "$HAS_DEGREE",
           "connectFromField": "HAS_DEGREE",
           "connectToField": "name",
           "as": "HAS_DEGREE"
       }
   },
   {
       "$graphLookup": {
           "from": "University",
           "startWith": "$HAS_DEGREE.OFFERED_AT",
           "connectFromField": "HAS_DEGREE.OFFERED_AT",
           "connectToField": "name",
           "as": "university"
       }
   },
   {
       "$graphLookup": {
           "from": "City",
           "startWith": "$university.IS_IN",
           "connectFromField": "university.IS_IN",
           "connectToField": "name",
           "as": "city"
       }
   },
   {
       "$graphLookup": {
           "from": "Country",
           "startWith": "$city.IS_IN",
           "connectFromField": "city.IS_IN",
           "connectToField": "name",
           "as": "country"
       }
   },
   {
       "$match": {
           "country.name": "$INPUT"
       }
   },
   {
       "$project": {
           "_id": 0,
           "personName": "$name"
       }
   }
]

3)   Retrieve a “Person” entity with the values of all its relationships

Neo4J Query:

MATCH (person:Person)-[:HAS_DEGREE]->(degree:Degree)-[:OFFERED_AT]->(university:University)-[:IS_IN]->(city:City)-[:IS_IN]->(universityCountry:Country)
WHERE person.name = '$INPUT'
MATCH (person)-[:BORN_IN]->(birthCity:City)-[:IS_IN]->(birthCountry:Country)
OPTIONAL MATCH (person)-[:SPOUSE]->(spouse:Person)
OPTIONAL MATCH (person)-[:CHILD]->(child:Person)
RETURN person.name AS Name, person.birthday AS Birthday, birthCity.name AS BirthCity, birthCountry.name AS BirthCountry, degree.name AS Degree, universityCountry.name AS UniversityCountry, spouse.name AS Spouse, collect({name: child.name, hobbies: child.hobbies}) AS Children, person.hobbies AS Hobbies
MongoDB Query Equivalence:
[
{
"$match": {
"name": "$INPUT"
}
},
{
"$graphLookup": {
"from": "City",
"startWith": "$BORN_IN",
"connectFromField": "BORN_IN",
"connectToField": "name",
"as": "birthCity"
}
},
{
"$graphLookup": {
"from": "Country",
"startWith": "$birthCity.IS_IN",
"connectFromField": "birthCity.IS_IN",
"connectToField": "name",
"as": "birthCountry"
}
},
{
"$graphLookup": {
"from": "Degree",
"startWith": "$HAS_DEGREE",
"connectFromField": "HAS_DEGREE",
"connectToField": "name",
"as": "HAS_DEGREE"
}
},
{
"$graphLookup": {
"from": "University",
"startWith": "$HAS_DEGREE.OFFERED_AT",
"connectFromField": "HAS_DEGREE.OFFERED_AT",
"connectToField": "name",
"as": "university"
}
},
{
"$graphLookup": {
"from": "City",
"startWith": "$university.IS_IN",
"connectFromField": "university.IS_IN",
"connectToField": "name",
"as": "universityCity"
}
},
{
"$graphLookup": {
"from": "Country",
"startWith": "$universityCity.IS_IN",
"connectFromField": "universityCity.IS_IN",
"connectToField": "name",
"as": "universityCountry"
}
},
{
"$graphLookup": {
"from": "Person",
"startWith": "$CHILD",
"connectFromField": "CHILD",
"connectToField": "name",
"as": "Children"
}
},
{
"$project": {
"_id": 0,
"Name": "$name",
"Birthday": "$birthday",
"BirthCity": "$born_in",
"BirthCountry": {
"$first": "$birthCountry.name"
},
"Degree": "$degree",
"UniversityCountry": {
"$first": "$universityCountry.name"
},
"Spouse": "$SPOUSE",
"Children": {
"$map": {
"input": "$Children",
"as": "child",
"in": {
"hobbies": "$$child.hobbies",
"name": "$$child.name"
}
}
},
"Hobbies": "$hobbies"
}
}
]

Wrapping Up

In this blog we covered the setup of data in MongoDB to achieve relations between different entities and how to perform complex graph queries to get the same results as in Neo4J. This would help in specific scenarios where only a single technology is used but required to create Knowledge Graphs to enhance search cases.

MongoDB offers further advice on how and why to implement a graph data search application either with just MongoDB, or in conjunction with a traditional graph database.

“Graph databases are a great choice when you need to analyze relationships between entities to look for patterns, generate intelligent recommendations, model networks, or compute graph algorithms. Graph databases tend to be less performant for non-traversal queries. Most applications require such queries, so they require a general-purpose database. For applications that have use cases that would benefit from both a graph database as well as a general-purpose database, developers have two options:

  • Couple a general-purpose database (like a document or relational database) with a graph database.
  • Use a general-purpose database (like MongoDB) that has graphing capabilities.”

If the subject of this blog aligns with one of your projects, we would be happy to provide advice and more sample code related to this example.

As always, please CONTACT US if you have any comments or questions, or to request a free consultation to discuss your ongoing search and AI projects.

– Fabian

Related Rources

Editorial Resources:

Twitter
LinkedIn

Stay up to date with our latest insights!