Neo4j - slow cypher query - big graph with hierarchies

Question

Using Neo4j 2.1.4. I have a graph with 'IS A' relationships (and other types of relationships) between nodes. I have some hierarchies inside the graph (IS A relationships) and I need to know the descendants (IS A relationship) of one hierarchy that has a particular-known relationship with some descendant of second hierarchy. If that particular-known relationship exists, I return the descendant/s of the first hierarchy.

INPUTS: 'ID_parentnode_hierarchy_01', 'ID_relationship', 'ID_parentnode_hierarchy_02'.
OUTPUT: Descendants (IS A relationship) of 'ID_parentnode_hierarchy_01' that has 'ID_relationship' with some descendant of 'ID_parentnode_hierarchy_02'.

Note: The graph has 500.000 nodes and 2 million relationships.

I am using this cypher query but it is very slow (aprox. 40s in a 4GB RAM and 3GHz Pentium Dual Core 64 bit PC). It is possible to build a faster query?

MATCH (parentnode_hierarchy_01: Node{nodeid : {ID_parentnode_hierarchy_01}})
WITH parentnode_hierarchy_01 
MATCH (parentnode_hierarchy_01) <- [:REL* {reltype: {isA}}] - (descendants01: Node)
WITH descendants01
MATCH (descendants01) - [:REL {reltype: {ID_relationship}}] -> (descendants02: Node)
WITH descendants02, descendants01
MATCH (parentnode_hierarchy_02: Node {nodeid: {ID_parentnode_hierarchy_02} }) 
<- [:REL* {reltype: {isA}}] - (descendants02)
RETURN DISTINCT descendants01;

Thank you very much.

Answer 1

Well, I can slightly clean up your query - this might help us understand the issues better. I doubt this one will run faster, but using the cleaned up version we can discuss what's going on: (mostly eliminating unneeded uses of MATCH / WITH )

MATCH (parent:Node {nodeid: {ID_parentnode_hierarchy_01}})<-[:REL* {reltype:{isA}}]-
      (descendants01:Node)-[:REL {reltype:{ID_relationship}}]->(descendants02:Node),

      (parent2:Node {nodeid: {ID_parentnode_hierarchy_02}})<-[:REL* {reltype:{isA}}]-
      (descendants02)
RETURN distinct descendants01;

This looks like you're searching two (probably large) trees, starting from the root, for two nodes somewhere in the tree that are linked by an {ID_relationship} .

Unless you can provide some query hints about which node in the tree might have an ID_relationship or something like that, at worst, this looks like you could end up comparing every two nodes in the two trees. So this looks like it could take n * k time, where n is the number of nodes in the first tree, k the number of nodes in the second tree.

Here are some strategy things to think about - which you should use depends on your data:

Is there some depth in the tree where these links are likely to be found? Can you put a range on the depth of [:REL* {reltype:{isA}}] ?
What other criteria can you add to descendants01 and descendants02 ? Is there anything that can help make the query more selective so that you're not comparing every node in one tree to every node in the other?

Another strategy you might try is this: (this might be a horrible idea, but it's worth trying) -- basically look for a path from one root to the other, over any number of undirected edges of either isa type, or the other. Your data model has :REL relationships with a reltype attribute. This is probably an antipattern; instead of a reltype attribute, why is the relationship type not just that? This prevents the query that I want to write, below:

MATCH p=shortestPath((p1:Node {nodeid: {first_parent_id}})-[:isA|ID_relationship*]-(p2:Node {nodeid: {second_parent_id}}))
return p;

This would return the path from one "root" to the other, via the bridge you want. You could then use path functions to extract whatever nodes you wanted. Note that this query isn't possible currently because of your data model.

Neo4j - slow cypher query - big graph with hierarchies

Question

1 answers

solution1
1 2014-10-20 12:53:28

Neo4j - slow cypher query - big graph with hierarchies

Question

1 answers

solution1 1 2014-10-20 12:53:28

solution1
1 2014-10-20 12:53:28