简体   繁体   中英

Performance in Neo4j

I have a database with 2.217.731 nodes and 3.127.475 relationships, where nodes are different equipment and relationships between them are like "CONNECTED_TO", "IS_INSIDE", etc.

I am trying to traverse the graph to find specific nodes. In Cypher it would look like

    MATCH (n:Equipment)<-[IS_INSIDE*]-()<-[CONNECTED_TO*]-(m:Cable) where n.name = "name" RETURN m

using Java Core API, which as I know should be the fastest way to query Neo4j and take seconds, however it runs for tens of minutes.

I am using neo4j-2.0.0 and java version "1.7.0_45", max Java Heap size 7 gigs

Neo4j properties:

    Map<String, String> config = new HashMap<>();

    config.put( "neostore.nodestore.db.mapped_memory", "1800M" );
    config.put( "neostore.relationshipstore.db.mapped_memory", "3G" );
    config.put( "neostore.propertystore.db.mapped_memory", "100M" );
    config.put( "neostore.propertystore.db.strings.mapped_memory", "150M" );
    config.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );

    inserter = BatchInserters.inserter("target/graphDb", config);

I am new in Neo4j and do not know how to tune it to achieve better performance.

If you have to traverse the whole graph then this will be slow. If this is a common query consider creating an index on Equiptment.name, which is possible in neo4j 2.0.0 milestone. it will then just look up matching names in the index (a hashtable basically), and then check for the pattern around matching nodes - this will be very fast. See http://blog.neo4j.org/2013/12/neo4j-20-ga-graphs-for-everyone.html

Please create an index on the equipment node's property for name.

CREATE INDEX ON :Equipment(name)

Then please try the following optimized query.

MATCH (n:Equipment { name: "name" }),
      (n)<-[IS_INSIDE*]-(x),
      (x)<-[CONNECTED_TO*]-(m:Cable)
RETURN m

Note that this is an equivalent match to the one you've specified but it chunks it up into triples which causes the query execution plan on Neo4j to first match the n:Equipment node on the property name, instead of doing a graph global match operation. From the reduced set of n:Equipment nodes the following match statements will more performantly scan the variable length patterns of IS_INSIDE and CONNECTED_TO .

The first thing is to realize that in GraphDB the performance majorly depends on the kind of model you have built and the Cardinality of the various nodes ( Equipment and Cables in your case). Often using PROFILE and EXPLAIN queries will lead to informative insights regarding the performance of your query in terms of the number of database hits it does and the time it takes. Choosing a query based on lower number of DB hits is advantageous.

With that, Let us look at the query you are using first:

MATCH (n:Equipment)<-[IS_INSIDE*]-()<-[CONNECTED_TO*]-(m:Cable) 
where n.name = "name" 
RETURN m

A couple of pointers on this would be:

1) When you are trying to find Equipment nodes which are Inside a particular Equipment node, you do not mention the nodes label. Try using:

MATCH (n:Equipment)<-[IS_INSIDE*]-(:Equipment)

Instead of

MATCH (n:Equipment)<-[IS_INSIDE*]-()

Since in your case you search for both your Equipment with Name "name" in both Equipment and Cable nodes. Using the alternative that i mentioned, it will be restricted to Cable nodes. Assuming that Equipment can't be inside cables.

2) As others Mentioned building indices on top of Equipment And Cable nodes will be helpful. Build an index on the Equipment.name property. You can build indices on all of your Cable properties, which may further improve performance.

Can you also share how many Equipment nodes are there and how many Cable nodes are there. Also i am assuming that you have already ensured that the Equipment nodes and Cable nodes are distinct. It is okay to have more relationships, but our model can often benefit from fewer nodes to match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM