简体   繁体   English

Neo4j查询最短路径卡住(不工作)如果我在图形节点中有2路关系并且节点是相互关联的

[英]Neo4j query for shortest path stuck (Do not work) if I have 2way relationship in graph nodes and nodes are interrelated

I made relation graph two relationship, like if A knows B then B knows A, Every node has unique Id and Name along with other properties.. So my graph looks like 我创建了关系图两个关系,就像A知道B然后B知道A,每个节点都有唯一的Id和Name以及其他属性..所以我的图看起来像

在此输入图像描述

if I trigger a simple query MATCH (p1:SearchableNode {name: "Ishaan"}), (p2:SearchableNode {name: "Garima"}),path = (p1)-[:NAVIGATE_TO*]-(p2) RETURN path it did not give any response and consumes 100% CPU and RAM of the machine. 如果我触发一个简单的查询MATCH(p1:SearchableNode {name:“Ishaan”}),(p2:SearchableNode {name:“Garima”}),path =(p1) - [:NAVIGATE_TO *] - (p2)RETURN路径它没有给出任何响应,并且消耗了机器的100%CPU和RAM。


UPDATED As I read though posts and from comments on this post I simplified the model and relationship. 更新当我阅读帖子和这篇文章的评论时,我简​​化了模型和关系。 Now it ends up to 现在它结束了 在此输入图像描述

Each relationship has different weights, to simplify consider horizontal connections weight 1, vertical weights 1 and diagonal relations have weights 1.5 In my database there are more than 85000 nodes and 0.3 Million relationships 每个关系都有不同的权重,简化考虑水平连接权重1,垂直权重1和对角关系有权重1.5在我的数据库中有超过85000个节点和30万个关系

Query with shortest path is not ends up to some result. 使用最短路径查询并不会导致某些结果。 It stuck in the processing and CPU goes to 100% 它停留在处理中,CPU达到100%

Let's consider what your query is doing: 让我们考虑一下您的查询正在做什么:

MATCH (p1:SearchableNode {name: "Ishaan"}), 
      (p2:SearchableNode {name: "Garima"}),
      path = (p1)-[:NAVIGATE_TO*]-(p2) 
RETURN path

If you run this query in the console with EXPLAIN in front of it, the DB will give you its plan for how it will answer. 如果您在控制台中使用EXPLAIN运行此查询,则DB将为您提供其应答方式的计划。 When I did this, the query compiler warned me: 当我这样做时,查询编译器警告我:

If a part of a query contains multiple disconnected patterns, this will build a cartesian product between all those parts. 如果查询的一部分包含多个断开连接的模式,这将在所有这些部分之间构建笛卡尔积。 This may produce a large amount of data and slow down query processing. 这可能会产生大量数据并减慢查询处理速度。 While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH 虽然偶尔会有意图,但通常可以通过在不同部分之间添加关系或使用OPTIONAL MATCH来重新制定避免使用此交叉产品的查询。

You have two issues going on with your query - first, you're assigning p1 and p2 independent of one another, possibly creating this cartesian product. 您的查询有两个问题 - 首先,您将p1p2分配给彼此独立,可能会创建此笛卡尔积。 The second issue is that because all of your links in your graph go both ways and you're asking for an undirected connection you're making the DB work twice as hard, because it could actually traverse what you're asking for either way. 第二个问题是因为你的图表中的所有链接都是双向的,并且你要求的是无向连接,所以你要让数据库工作两倍,因为它实际上可以遍历你要求的任何一种方式。 To make matters worse, because all of the links go both ways, you have many cycles in your graph, so as cypher explores the paths that it can take, many paths it will try will loop back around to where it started. 更糟糕的是,因为所有的链接都是双向的,所以你的图中有很多循环,所以当cypher探索它可以采取的路径时,它将尝试的许多路径将循环回到它开始的位置。 This means that the query engine will spend a lot of time chasing its own tail. 这意味着查询引擎将花费大量时间追逐自己的尾巴。

You can probably immediately improve the query by doing this: 您可以通过执行以下操作立即改进查询:

MATCH p=shortestPath((p1:SearchableNode {name:"Ishaan"})-[:NAVIGATE_TO*]->(p2:SearchableNode {name:"Garima"}))
RETURN p;

Two modifications here - p1 and p2 are bound to each other immediately, you don't separately match them. 这里有两个修改 - p1和p2立即相互绑定,你不单独匹配它们。 Second, notice the [:NAVIGATE_TO*]-> part, with that last arrow -> ; 其次,注意[:NAVIGATE_TO*]->部分,最后一个箭头-> ; we're matching the relationship ONE WAY ONLY. 我们只用一种方式匹配关系。 Since you have so many reflexive links in your graph, either way would work fine, but either way you choose you cut the work the DB has to do in half. 由于图表中有如此多的自反链接,因此无论哪种方式都可以正常工作,但无论哪种方式,您选择将数据库削减的工作量减少一半。 :) :)

This may still perform not so great, because traversing that graph is still going to have a lot of cycles, which will send the DB chasing its tail trying to find the best path. 这可能仍然表现得不那么好,因为遍历该图仍然会有很多周期,这将导致数据库追逐其尾部试图找到最佳路径。 In your modeling choice here, you usually shouldn't have relationships going both ways unless you need separate properties on each relationship . 在这里的建模选择中,除非在每个关系上需要单独的属性,否则通常不应该双向 关系 A relationship can be traversed in both directions, so it doesn't make sense to have two (one in each direction) unless the information that relationship is capturing is semantically different. 可以在两个方向上遍历关系,因此除非关系捕获的信息在语义上不同,否则有两个(每个方向一个)没有意义。

Often you'll find with query performance that you can do better by reformulating the query and thinking about it, but there's major interplay between graph modeling and overall performance. 通常,您会发现查询性能可以通过重新构造查询并考虑它来做得更好,但图形建模和整体性能之间存在重大的相互作用。 With the graph set up with so many bi-directional links, there will only be so much you can do to optimize path-finding. 由于图形设置有如此多的双向链接,因此您只能做很多事情来优化路径查找。

im afraid you wont be able to do much here. 我怕你在这里做不了多少。 your graph is very specific, having a relation only to closest nodes. 您的图表非常具体,只与最近的节点有关系。 thats too bad cause neo4j is ok to play around the starting point +- few relations away, not over whole graph with each query 这太糟糕了,因为neo4j可以在起点附近玩+ - 很少有关系,而不是每个查询的整个图表

it means, once, you are 2 nodes away, the computational complexity raises up to: 这意味着,一旦你离开2个节点,计算复杂度就会提高到:

8 relationships per node
distance 2
8 + 8^2

in general, the top complexity for a distance n is 通常,距离n的最高复杂度是

O(8 + 8^n) //in case all affected nodes have 8 connections

you say, you got like ~80 000 of nodes.this means (correct me if im wrong), the longest distance of ~280 (from √80000 ). 你说,你有~80 000个节点。这意味着(纠正我,如果我错了),最长距离~280 (来自√80000 )。 lets suppose your nodes 让我们假设您的节点

(p1:SearchableNode {name: "Ishaan"}), 
(p2:SearchableNode {name: "Garima"}),

to be only 140 hopes away. 只有140希望之遥。 this will create a complexity of 8^140 = 10e126 , im not sure if any computer in the world can handle this. 这将产生8^140 = 10e126的复杂性,我不确定世界上是否有任何计算机可以处理这个问题。

sure, not all nodes have 8 connections, only those "in the middle", in our example graph it will have ~500 000 relationships. 当然,并非所有节点都有8个连接,只有那些“在中间”,在我们的示例图中它将具有~500 000个关系。 you got like ~300 000, which is maybe 2 times less so lets supose the overal complexity for an average distance of 70 (out of 140 - a very relaxed bottom estimation) for nodes having 4 relationships in average (down from 8, 80 000 *4 = 320 000) to be 你有~300 000,这可能是2倍,所以让平均距离为70(140个 - 非常放松的底部估计)的平均距离为8个80,000的节点的总复杂性* 4 = 320 000)

O(4 + 4^70) = ~10e42

one 1GHz CPU should be able to calculate this by: 一个1GHz CPU应该可以通过以下方式计算:

-1000 000 per second
10e42 == 10e36 * 1 000 000 -> 10e36 seconds

lets supose we got a cluster of 100 10Ghz cpu serves, 1000 GHz in total. 让我们得到一个100个10Ghz cpu服务的集群,总共1000 GHz。 thats still 10e33 * 1 000 000 000 -> 10e33seconds 那仍然是10e33 * 1 000 000 000 -> 10e33seconds

i would suggest to just keep away from AllshortestPaths , and look only for the first path available. 我建议远离AllshortestPaths ,只查看可用的第一条路径。 using gremlin instead of cypher it is possible to implement own algorithms with some heuristics so actually you can cut down the time to maybe seconds or less. 使用gremlin而不是cypher可以使用一些启发式实现自己的算法,所以实际上你可以将时间减少到几秒或更短。

exmaple: using one direction only = down to 10e16 seconds. 例如:仅使用一个方向=低至10e16秒。

an example heuristic: check the id of the node, the higher the difference (subtraction value) between node2.id - node1.id, the higher the actual distance (considering the node creation order - nodes with similar ids to be close together). 示例启发式:检查节点的id,node2.id - node1.id之间的差值(减法值)越高,实际距离越高(考虑节点创建顺序 - 具有相似ID的节点靠近在一起)。 in that case you can either skip the query or just jump few relations away with something like MATCH n1-[:RELATED..5]->q-[:RELATED..*]->n2 (i forgot the syntax of defining exact relation count) which will (should) actually jump (instantly skip to) 5 distances away nodes which are closer to the n2 node = complexity down from 4^70 to 4^65 . 在这种情况下,您可以跳过查询或只是跳过一些关系,如MATCH n1-[:RELATED..5]->q-[:RELATED..*]->n2 (i forgot the syntax of defining exact relation count)将(应该)实际跳转(立即跳转到)5个距离远离n2节点的节点=复杂度从4^704^65 so if you can exactly calculate the distance from the nodes id, you can even match ... [:RELATED..65] ... which will cut the complexity to 4^5 and thats just matter of miliseconds for cpu. 因此,如果你可以准确地计算节点id的距离,你甚至可以匹配... [:RELATED..65] ...这将把复杂性降低到4^5 ,这对于cpu来说只是... [:RELATED..65] ...毫秒。

its possible im completely wrong here. 它可能在这里完全错误。 it has been already some time im our of school and would be nice to ask a mathematician (graph theory) to confirm this. 我的学校已经有一段时间了,很高兴请数学家(图论)证实这一点。

MATCH (p1:SearchableNode {name: "Ishaan"}), (p2:SearchableNode {name: "Garima"}),path = (p1)-[:NAVIGATE_TO*]->(p2) RETURN path

要么:

MATCH (p1:SearchableNode {name: "Ishaan"}), (p2:SearchableNode {name: "Garima"}), (p1)-[path:NAVIGATE_TO*]->(p2) RETURN path

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM