Memgraph
Memgraph is the open-source graph database, compatible with
Neo4j
. The database is using theCypher
graph query language,Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph.
This notebook shows how to use LLMs to provide a natural language interface to a Memgraph database.
Setting upโ
To complete this tutorial, you will need Docker and Python 3.x installed.
Ensure you have a running Memgraph instance. To quickly run Memgraph Platform (Memgraph database + MAGE library + Memgraph Lab) for the first time, do the following:
On Linux/MacOS:
curl https://install.memgraph.com | sh
On Windows:
iwr https://windows.memgraph.com | iex
Both commands run a script that downloads a Docker Compose file to your system, builds and starts memgraph-mage
and memgraph-lab
Docker services in two separate containers.
Read more about the installation process on Memgraph documentation.
Now you can start playing with Memgraph
!
Begin by installing and importing all the necessary packages. We'll use the package manager called pip, along with the --user
flag, to ensure proper permissions. If you've installed Python 3.4 or a later version, pip is included by default. You can install all the required packages using the following command:
pip install langchain langchain-openai neo4j gqlalchemy --user
You can either run the provided code blocks in this notebook or use a separate Python file to experiment with Memgraph and LangChain.
import os
from gqlalchemy import Memgraph
from langchain.chains import GraphCypherQAChain
from langchain_community.graphs import MemgraphGraph
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
We're utilizing the Python library GQLAlchemy to establish a connection between our Memgraph database and Python script. You can establish the connection to a running Memgraph instance with the Neo4j driver as well, since it's compatible with Memgraph. To execute queries with GQLAlchemy, we can set up a Memgraph instance as follows:
memgraph = Memgraph(host="127.0.0.1", port=7687)
Populating the databaseโ
You can effortlessly populate your new, empty database using the Cypher query language. Don't worry if you don't grasp every line just yet, you can learn Cypher from the documentation here. Running the following script will execute a seeding query on the database, giving us data about a video game, including details like the publisher, available platforms, and genres. This data will serve as a basis for our work.
# Creating and executing the seeding query
query = """
MERGE (g:Game {name: "Baldur's Gate 3"})
WITH g, ["PlayStation 5", "Mac OS", "Windows", "Xbox Series X/S"] AS platforms,
["Adventure", "Role-Playing Game", "Strategy"] AS genres
FOREACH (platform IN platforms |
MERGE (p:Platform {name: platform})
MERGE (g)-[:AVAILABLE_ON]->(p)
)
FOREACH (genre IN genres |
MERGE (gn:Genre {name: genre})
MERGE (g)-[:HAS_GENRE]->(gn)
)
MERGE (p:Publisher {name: "Larian Studios"})
MERGE (g)-[:PUBLISHED_BY]->(p);
"""
memgraph.execute(query)
Refresh graph schemaโ
You're all set to instantiate the Memgraph-LangChain graph using the following script. This interface will allow us to query our database using LangChain, automatically creating the required graph schema for generating Cypher queries through LLM.
graph = MemgraphGraph(url="bolt://localhost:7687", username="", password="")
If necessary, you can manually refresh the graph schema as follows.
graph.refresh_schema()
To familiarize yourself with the data and verify the updated graph schema, you can print it using the following statement.
print(graph.schema)
Node properties are the following:
Node name: 'Game', Node properties: [{'property': 'name', 'type': 'str'}]
Node name: 'Platform', Node properties: [{'property': 'name', 'type': 'str'}]
Node name: 'Genre', Node properties: [{'property': 'name', 'type': 'str'}]
Node name: 'Publisher', Node properties: [{'property': 'name', 'type': 'str'}]
Relationship properties are the following:
The relationships are the following:
['(:Game)-[:AVAILABLE_ON]->(:Platform)']
['(:Game)-[:HAS_GENRE]->(:Genre)']
['(:Game)-[:PUBLISHED_BY]->(:Publisher)']
Querying the databaseโ
To interact with the OpenAI API, you must configure your API key as an environment variable using the Python os package. This ensures proper authorization for your requests. You can find more information on obtaining your API key here.
os.environ["OPENAI_API_KEY"] = "your-key-here"
You should create the graph chain using the following script, which will be utilized in the question-answering process based on your graph data. While it defaults to GPT-3.5-turbo, you might also consider experimenting with other models like GPT-4 for notably improved Cypher queries and outcomes. We'll utilize the OpenAI chat, utilizing the key you previously configured. We'll set the temperature to zero, ensuring predictable and consistent answers. Additionally, we'll use our Memgraph-LangChain graph and set the verbose parameter, which defaults to False, to True to receive more detailed messages regarding query generation.
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0), graph=graph, verbose=True, model_name="gpt-3.5-turbo"
)
Now you can start asking questions!
response = chain.run("Which platforms is Baldur's Gate 3 available on?")
print(response)
> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (g:Game {name: 'Baldur\'s Gate 3'})-[:AVAILABLE_ON]->(p:Platform)
RETURN p.name
Full Context:
[{'p.name': 'PlayStation 5'}, {'p.name': 'Mac OS'}, {'p.name': 'Windows'}, {'p.name': 'Xbox Series X/S'}]
> Finished chain.
Baldur's Gate 3 is available on PlayStation 5, Mac OS, Windows, and Xbox Series X/S.
response = chain.run("Is Baldur's Gate 3 available on Windows?")
print(response)
> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (:Game {name: 'Baldur\'s Gate 3'})-[:AVAILABLE_ON]->(:Platform {name: 'Windows'})
RETURN true
Full Context:
[{'true': True}]
> Finished chain.
Yes, Baldur's Gate 3 is available on Windows.