简体   繁体   中英

Performance of Java API versus Python with Cypher for Neo4J

I am working with an application that uses a Neo4J graph containing about 10 million nodes. One of the main tasks that I run daily is the batch import of new/updated nodes into the graph, on the order of about 1-2 million. After experimenting with Python scripts in combination with the Cypher query language, I decided to give the embedded graph with Java API a try in order to get better performance results.

What I found is about a 5x improvement using the native Java API. I am using Neo4j 2.1.4, which I believe is the latest. I have read in other posts that the embedded graph is a bit faster, but that this should/could be changing in the near future. I would like to validate my findings with anyone who has observed similar results?

I have included snippets below just to give a general sense of methods used - code has been greatly simplified.

sample from cypher/python:

cnode = self.graph_db.create(node(hash = obj.hash,
    name = obj.title,
    date_created = str(datetime.datetime.now()),
    date_updated = str(datetime.datetime.now())
))

sample from embedded graph using java:

final Node n = Graph.graphDb.createNode();
for (final Label label : labels){
    n.addLabel(label);
}
for (Map.Entry<String, Object> entry : properties.entrySet()) {
    n.setProperty(entry.getKey(), entry.getValue());
}

Thank you for your insight!

What you're actually doing here is comparing the speeds of two different APIs and merely using two different languages to do that. Therefore, you're not comparing like for like. The Java core API and the REST API used by Python (and other languages) have different idioms, such as explicit vs implicit transactions. Additionally, network latency associated with the REST API will make a great difference, especially if you are using one HTTP call per node created.

So to get a more meaningful performance comparison, make sure you are comparing like for like: use Java via the REST API perhaps or use Cypher for both tests.

Hint 1: you will get better performance in general over REST by batching up a number of requests into a single API call.

Hint 2: the REST API will never be as fast as the core API as the latter is native and the former has many more layers to go through.

Without proper performance measurements, it's a hard to tell where the times goes. Generally, Python scripts are slower than Java but the language is faster to write code in, so you trade development speed for execution speed.

For example: Your code above takes one hour to run in Python and 12 minutes in Java. Writing the Python version took you 1 day, the Java version took you 3 days. That means you need to run the code at least 2 days / (60 - 12) minutes = 60 times to reach break even.

The example, of course, only makes sense as long as you can afford to wait the 48 minutes for Python to do its job. If your system is down for the time of the import, then 60 vs 12 minutes makes a huge difference - unless you can run it during the night when no one cares.

如果您使用Java与Python 3( http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=all&lang=java&lang2=python3&data=u32 )进行“基准测试游戏”,则可以将其改进5倍Java版本肯定是合理的。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM