Neo4j For Python Users and Broken Pipe Error

Big Data Broken Pipe Data Analytics graph database Machine Learning Neo4j python

Neo4j is probably the most favorite graph database for many developers by providing1:

  • Biggest and Most Active Graph Community
  • Highly Performant Read and Write Scalability
  • High Performance
  • Ease of use
  • Free licenses

As a data scientist, I enjoy every moment of working with Neo4j because of the above reason as well as its application within data scientists, such as2:

  • Recommendation systems
  • Fraud detection
  • Supply chain transparency and optimization

 

Although the great promises Neo4j can create for data-scientists, it can still be unknown for many data scientists at this moment. Nevertheless, Python is a well-known tool in the data-science toolkit and that is the reason I would like to address Neo4j within Python for data science purposes.

The aim of this article is twofold:

  • A short introduction on how to use Neo4j within Python
  • Addressing the ‘Broken Pipe’ error which can happen when you use Neo4j with Python.

 

This article is written in a simple language however to follow this tutorial, you need to have a basic knowledge and understanding of Python and Cypher Queries (the query language of Neo4j).

Neo4j package

Neo4j released its own package to help python users communicate with Neo4j through Python language. It is of great help to have the power of Python next to Neo4j. You can download this package from its official website: https://pypi.org/project/neo4j/

Although there are other third-party Python packages for Neo4j, I highly recommend you to use the official package, as it is maintained better than the rest and is more popular among programmers.

Cypher Query Language

Just like many other databases, Neo4j has its own query language which in my opinion is very intuitive and easy to learn. For example, creating a node in Cypher language is:

CREATE (w:MyNodes {Name : 'John',  Title : 'President',  Age : 22}) RETURN id(w);

I do not aim at this article to cover Cypher queries. You are encouraged to check other tutorials for this purpose.

 

Running Cypher Queries in Python

There are many scenarios where you would like to run Cypher queries within Python. Think of parsing an XML file where the data should be extracted from a complex XML file and translated to Neo4j. Python and its packages are ideal to parse XML files, so you can combine the power of Python with Neo4j. Additionally, as a data scientist, Python offers many packages for machine learning and data analytics.

Here is an example of running Cypher Query in Python:

from neo4j import GraphDatabase
uri = "bolt://localhost:7687"
user="xxxx"
password="xxxx"

driver = GraphDatabase.driver(uri, auth=(user, password))
with driver.session() as session:
    session.run("CREATE (w:MyNode {Name : 'John',  Title : 'President',  Age : 22}) RETURN id(w)")
    session.close()

The above example is the simplest way of running a Cypher query within Python. By running the above query, a new node is created in your Neo4j database where you can check it in your Neo4j instance.

When things get complex and “IOError: [Errno 32] Broken pipe” error

My personal experience is when you try to write huge files (more than 4000 nodes + 4000 relations), the performance of the communication between Python and Neo4j degrades. You probably get the Broken Pipe error which does not let you upload more nodes and relations into your Neo4j database.

I tried many things to solve this issue including setting up the ‘driver’ parameters and trying to let Python ignore this error and etc.

however, none of them worked for me. The only out of the box solution that I came up with is the following:

  • Instead of running queries from Python, saving them to a text file (called Cypher query file). It can have any extension such as .txt but the official extension is .cypher or .cql.

In our example here the file called example.txt and its content is the below two lines (two Cypher queries):

CREATE (w:MyNodes {Name : 'John',  Title : 'President',  Age : 22}) RETURN id(w);
CREATE (w:MyNodes {Name : 'Alan',  Title : 'Manager',  Age : 33}) RETURN id(w);

Python programmers are aware to make such a file is not difficult. In our example this part is fixed:

CREATE (w:MyNodes {Name : 'xxx',  Title : 'xxx',  Age : xxx}) RETURN id(w);

And the variable parts (marked as xxx) are filled during parsing our XML file (in our hypothetical scenario).

Please note that semicolons at the end of each line are compulsory in our example. 

  • Go to your terminal and run the following command:
Cat example.txt | cypher-shell -u USERNAME -p PASWWORD -a ADDRESS_OF_NEO4J

[for many users ADDRESS_OF_NEO4J is “bolt://localhost:7687” )

Once you run the above command your database will be updated with new data. In my last use-case, I have almost 200k queries (nodes + relationships). I remember when it reached almost 8k (nodes + relationships) I constantly got a “Broken Pipe” error and this is what I have done to solve the issue.

 

Conclusion

In this article I tried to make you familiar with how to use Neo4j within Python and to the best of my knowledge is the only resource on the internet (at the time of writing this tutorial) that addresses the “Broken Pipe” error and its solution. I hope you can make the best use of these two powerful tools and enjoy working Neo4j within Python.

References

  1. https://neo4j.com/top-ten-reasons/
  2. https://neo4j.com/blog/5-noteworthy-use-cases-graph-technology-analytics/
Big Data Broken Pipe Data Analytics graph database Machine Learning Neo4j python