简体   繁体   中英

Does relationship creation order effect query performance in Neo4j?

I'm using a batch inserter to create a database with about 1 billion nodes and 10 billion relationships. I've read in multiple places that it is preferable to sort the relationships in order min(from, to) (which I didn't do), but I haven't grasped why this practice is optimal. I originally thought this only aided insertion speed, but when I turned the database on, traversal was very slow. I realize there can be many reasons for that, especially with a database this size, but I want to be able to rule out the way I'm storing relationships.

Main question: does it kill traversal speed to insert relationships in a very "random" order because of where they will be stored on disk? I'm thinking that maybe when it tries to traverse nodes, the relationships are too fragmented. I hope someone can enlighten me about whether this would be the case.

UPDATES :

  • Use-case is pretty much the basic Neo4j friends of friends example using Cypher via the REST API for querying.

  • Each node (person) is unique and has a bunch of "knows" relationships for who they known. Although I have a billion nodes, all of the 10 billion relationships come from about 30 million of the nodes. So for any starting node I use in my query, it has an average of about 330 relationships coming from it.

  • In my initial tests, even getting 4 non-ordered friends of friends results was incredibly slow (100+ seconds on average). Of course, after the cache was warmed up for each query, it was fairly quick, but the graph is pretty random and I can't have the whole relationship store in memory.

Some of my system details, if that's needed: - Neo4j 1.9.RC1 - Running on Linux server, 128gb of RAM, 8 core machine, non-SSD HD

I have not worked with Neo4J on such a large scale, but as far as i know this won't make much difference in the speed. Could you provide any links which state the order of insertion matters.

What matters in this case if the relations are cached or not. Until the cache is fairly populated, performance will be on the slower side. You should also set an appropriate cache size as soon as the index is created.

You should read this link on regarding neo4j performance .

Read the neo4j documentation on batch insert and these SO questions for help with bulk insert if you haven't already read them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM