简体   繁体   中英

How to configure to make neo4j faster?

I try to use neo4j to do some experiment about SNS. I have created a random graph consisted of 1 million users, 100 thousand items, and each user has about 100 friends and 100 favourite items. So there are about 1 million nodes and 200 million relationships in the graph and the graph files take up 4.8GB. All nodes only have an id and I have created index for them. Now I have used Java APIs to set up a small cluster to maintain this graph, which is consisted of three VMs. Each VM has 16GB ram, Intel Xeon CPU 2.00GHz(8 cores). Below is some configuration:

config.put( "neostore.nodestore.db.mapped_memory", "150M");
config.put("neostore.relationshipstore.db.mapped_memory", "5G");
config.put( "neostore.propertystore.db.mapped_memory", "100M");
config.put( "neostore.propertystore.db.strings.mapped_memory", "130M");
config.put( "neostore.propertystore.db.arrays.mapped_memory", "130M");
config.put( "node_auto_indexing", "true");
config.put( "use_memory_mapped_buffers", "true");
config.put( "neostore.propertystore.db.index.keys.mapped_memory", "150M");
config.put( "neostore.propertystore.db.index.mapped_memory", "150M");

I use the gcr cache_type. I simply warm up the graph by traversing:

for ( Node n : GlobalGraphOperations.at(db).getAllNodes() ) {
    n.getPropertyKeys();
    for ( Relationship relationship : n.getRelationships() ) {
        start = relationship.getStartNode();
    }
}

The cypher query :

start user=node:users({key}={value}) match user-[:FRIEND]->(friend)-[:LIKES]->(item) return item, collect(friend), count(0) order by count(0) desc limit 32;

,which means finding out one's friends' favourite items. I run the jar with the command: java -d64 -server -XX:+UseConcMarkSweepGC -XX:+UseNUMA -Xms10752m -Xmx10752m -Xmn2688m -jar Neo4J-1.0-SNAPSHOT.jar

Now, my experiment results: (1) single thread Each query costs about 70ms on average. (2) 8-thread Each query costs about 160ms on average, and many queries cost more than 500ms. The RPS is about 50/sec.

I want to improve the performance, but don't know how. It seems the ram is not enough to keep all the data, is that right? Besides, I' have tried the soft and strong cache_type, and the ram is full quickly when it's warming up.

Please help me and teach me how to improve it. Thanks a lot.

If the heap size / available RAM is too small to hold the full dataset in the object cache, you can go with the enterprise edition. By putting a load balancer in front of your n Neo4j instances that routes all requests for a certain part of the graph to the same instance you do basically a object cache sharding. Jim Webber bloggt on this approach: http://jim.webber.name/2011/02/scaling-neo4j-with-cache-sharding-and-neo4j-ha/

For performance critical queries it might be an idea to refactor the Cypher query into an equivalent using traversal API or even go down to core API.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM