简体   繁体   English

Neo4j-慢密码查询-具有层次结构的大图

[英]Neo4j - slow cypher query - big graph with hierarchies

Using Neo4j 2.1.4. 使用Neo4j 2.1.4。 I have a graph with 'IS A' relationships (and other types of relationships) between nodes. 我有一个节点之间具有“ IS A”关系(以及其他类型的关系)的图形。 I have some hierarchies inside the graph (IS A relationships) and I need to know the descendants (IS A relationship) of one hierarchy that has a particular-known relationship with some descendant of second hierarchy. 我在图内有一些层次结构(IS A关系),我需要知道一个层次结构的后代(IS A关系),该层次结构与第二层次结构的某些后代具有特定的已知关系。 If that particular-known relationship exists, I return the descendant/s of the first hierarchy. 如果存在该特定的已知关系,则返回第一个层次结构的后代。

INPUTS: 'ID_parentnode_hierarchy_01', 'ID_relationship', 'ID_parentnode_hierarchy_02'. 输入: “ ID_parentnode_hierarchy_01”,“ ID_relationship”,“ ID_parentnode_hierarchy_02”。
OUTPUT: Descendants (IS A relationship) of 'ID_parentnode_hierarchy_01' that has 'ID_relationship' with some descendant of 'ID_parentnode_hierarchy_02'. 输出: “ ID_parentnode_hierarchy_01”的后代(IS关系)与“ ID_parentnode_hierarchy_02”的后代具有“ ID_relationship”。

Note: The graph has 500.000 nodes and 2 million relationships. 注意:该图具有500.000节点和200万个关系。

I am using this cypher query but it is very slow (aprox. 40s in a 4GB RAM and 3GHz Pentium Dual Core 64 bit PC). 我正在使用此密码查询,但是它非常慢(在4GB RAM和3GHz Pentium Dual Core 64位PC中大约为40s)。 It is possible to build a faster query? 有可能建立一个更快的查询吗?

MATCH (parentnode_hierarchy_01: Node{nodeid : {ID_parentnode_hierarchy_01}})
WITH parentnode_hierarchy_01 
MATCH (parentnode_hierarchy_01) <- [:REL* {reltype: {isA}}] - (descendants01: Node)
WITH descendants01
MATCH (descendants01) - [:REL {reltype: {ID_relationship}}] -> (descendants02: Node)
WITH descendants02, descendants01
MATCH (parentnode_hierarchy_02: Node {nodeid: {ID_parentnode_hierarchy_02} }) 
<- [:REL* {reltype: {isA}}] - (descendants02)
RETURN DISTINCT descendants01;

Thank you very much. 非常感谢你。

Well, I can slightly clean up your query - this might help us understand the issues better. 好吧,我可以稍微整理一下您的查询-这可以帮助我们更好地理解问题。 I doubt this one will run faster, but using the cleaned up version we can discuss what's going on: (mostly eliminating unneeded uses of MATCH / WITH ) 我怀疑这会运行得更快,但是使用清理后的版本,我们可以讨论正在发生的事情:(主要是消除对MATCH / WITH不必要使用)

MATCH (parent:Node {nodeid: {ID_parentnode_hierarchy_01}})<-[:REL* {reltype:{isA}}]-
      (descendants01:Node)-[:REL {reltype:{ID_relationship}}]->(descendants02:Node),

      (parent2:Node {nodeid: {ID_parentnode_hierarchy_02}})<-[:REL* {reltype:{isA}}]-
      (descendants02)
RETURN distinct descendants01;

This looks like you're searching two (probably large) trees, starting from the root, for two nodes somewhere in the tree that are linked by an {ID_relationship} . 看起来您正在从根开始搜索两棵(可能是大棵)树,以寻找树中某处由{ID_relationship}链接的两个节点。

Unless you can provide some query hints about which node in the tree might have an ID_relationship or something like that, at worst, this looks like you could end up comparing every two nodes in the two trees. 除非您可以提供有关树中哪个节点可能具有ID_relationship或类似名称的查询提示,否则,最糟糕的是,看起来您最终可能会比较两棵树中的每两个节点。 So this looks like it could take n * k time, where n is the number of nodes in the first tree, k the number of nodes in the second tree. 因此,这似乎需要花费n * k的时间,其中n是第一棵树中的节点数,k是第二棵树中的节点数。

Here are some strategy things to think about - which you should use depends on your data: 以下是一些需要考虑的策略事项-您应根据数据使用哪种策略:

  1. Is there some depth in the tree where these links are likely to be found? 在树中是否有可能找到这些链接的深度? Can you put a range on the depth of [:REL* {reltype:{isA}}] ? 您可以在[:REL* {reltype:{isA}}]的深度上设置范围吗?
  2. What other criteria can you add to descendants01 and descendants02 ? 您还可以将哪些其他条件添加到descendants01descendants02 Is there anything that can help make the query more selective so that you're not comparing every node in one tree to every node in the other? 是否有什么可以使查询更具选择性的,从而不会将一棵树中的每个节点与另一棵树中的每个节点进行比较?

Another strategy you might try is this: (this might be a horrible idea, but it's worth trying) -- basically look for a path from one root to the other, over any number of undirected edges of either isa type, or the other. 您可以尝试的另一种策略是:(这可能是一个可怕的想法,但是值得尝试)-基本上是在一个isa类型或另一个isa任意数量的无向边缘上寻找从一个根到另一个根的路径。 Your data model has :REL relationships with a reltype attribute. 您的数据模型具有:REL关系和reltype属性。 This is probably an antipattern; 这可能是一种反模式; instead of a reltype attribute, why is the relationship type not just that? 为什么关系类型不只是reltype属性? This prevents the query that I want to write, below: 这可以防止我想写 ,下面的查询:

MATCH p=shortestPath((p1:Node {nodeid: {first_parent_id}})-[:isA|ID_relationship*]-(p2:Node {nodeid: {second_parent_id}}))
return p;

This would return the path from one "root" to the other, via the bridge you want. 这将通过您想要的网桥将路径从一个“根”返回到另一个。 You could then use path functions to extract whatever nodes you wanted. 然后,您可以使用路径函数提取所需的任何节点。 Note that this query isn't possible currently because of your data model. 请注意,由于您的数据模型,当前无法进行此查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM