简体   繁体   English

如何在Neo4j中找到关键节点?

[英]How to find critical nodes in Neo4j?

So, assume a very simple graph 因此,假设一个非常简单的图形

(A)->(B)->(D)->(E) | (A)->(C)->(D)->(E)

Which will look something like <>- if you visualize it. 如果将其可视化,它将看起来像<>-

A critical node in a graph is one where if you remove it, you will now have 2 graphs. 图中的关键节点是一个节点,如果将其删除,则现在将有2个图。 (AKA, a single point of failure) (又称单点故障)

So in this example, E is not critical, because it is a leaf, and B and C are not critical because A and D are still connected by the other node. 因此,在此示例中,E不是关键的,因为它是叶,而B和C也不重要,因为A和D仍然由另一个节点连接。 D is critical though because removing it will orphan E from the rest of the graph. 尽管D是至关重要的,因为将其删除会使图的其余部分孤立E。

Using Cypher, how do I find the critical node(s)? 使用Cypher,如何找到关键节点? (in this case, D) (在这种情况下,D)


My first instinct is to take all paths, and count how many times each node is touched, but that would be horrible inefficient and unreliable. 我的第一个直觉是走所有路径,计算每个节点被触摸了多少次,但这将是非常低效且不可靠的。 My second instinct is something like WHERE NONE (n in path WHERE NOT n in OTHER_PATHS) , but even if I could figure out how to make that work, I wouldn't know which node(s) in the path was critical. 我的第二本能是类似WHERE NONE (n in path WHERE NOT n in OTHER_PATHS) ,但是即使我能弄清楚如何使之工作,我也不知道路径中的哪个节点是关键的。

I found this blog, but it seems to assume you already know something about the critical nodes. 我找到了这个博客,但似乎假设您已经了解关键节点。

We can approach this through its definition: 我们可以通过其定义来解决这个问题:

A critical node in a graph is one where if you remove it, you will now have (at least) 2 graphs. 图中的关键节点是一个节点,如果将其删除,则现在将拥有(至少)2个图。

Or put another way, if all nodes are initially reachable from each other, and if we remove a node, and this changes the number of nodes reachable from any other node in the graph, then that removed node is a critical node. 或换一种说法,如果所有节点最初都是彼此可达的,并且如果我们删除一个节点,并且这更改了图中任何其他节点可达的节点数,则该删除的节点就是关键节点。

The big obstacle with attempting this via Cypher alone is that Cypher variable-length path matches are designed to find all possible paths, so it's inefficient at finding all reachable nodes. 仅通过Cypher尝试进行此操作的最大障碍是Cypher可变长度路径匹配旨在查找所有可能的路径,因此在查找所有可到达的节点方面效率低下。

Using APOC Path expander procedures we can change the uniqueness used during the traversal so we only ever find a single path to each distinct node and dismiss all others, cutting down the number of paths we need to explore, making it much faster at finding all reachable nodes in the graph. 使用APOC路径扩展器过程,我们可以更改遍历过程中使用的唯一性,因此我们只能找到到每个不同节点的一条路径,而忽略所有其他节点,从而减少了需要探索的路径数量,从而更快地找到了所有可达的路径图中的节点。

Using this, we can first compute all nodes in the graph, then for every node, see if blacklisting that node during expansion (effectively seeing what happens when we remove the node) causes expansion from another node to find less than the entire graph (-1 of course, for the node we "removed"). 使用此方法,我们可以首先计算图中的所有节点,然后对于每个节点,查看是否在扩展过程中将该节点列入黑名单(有效地查看了删除节点后发生的情况),是否导致其他节点的扩展少于整个图(- 1,对于我们“删除”的节点)。

You will need to use an APOC version newer than the Summer 2018 releases ( >= 3.3.0.4 on the 3.3.x line, or >= 3.4.0.2 on the 3.4.x line) in order to use this approach, as the blacklistNodes feature we need was added with this release. 为了使用此方法,您将需要使用比2018年夏季发布的版本更高的APOC版本(3.3.x行上> = 3.3.0.4,或3.4.x行上> = 3.4.0.2),作为blacklistNodes这个版本增加了我们需要的功能。

Here's the general approach, assuming that we're considering all nodes, and that all nodes in the graph are initially reachable from each other. 这是通用的方法,假设我们正在考虑所有节点,并且图中的所有节点最初都是可以相互访问的。

MATCH (n)
WITH collect(n) as allNodes
WITH allNodes, size(allNodes) - 1 as totalNodes, allNodes[..2] as startNodes
// using total as one less than the actual total since we're 'removing' a node.
// 2 potential start nodes so we always have one if the other is to be removed.
UNWIND allNodes as nodeToRemove
// we now have each node in the graph on its row, we'll try removing each one
WITH [node in startNodes WHERE node <> nodeToRemove][0] as startNode, nodeToRemove, totalNodes
CALL apoc.path.subgraphNodes(startNode, {blacklistNodes:[nodeToRemove]}) YIELD node
WITH totalNodes, nodeToRemove, count(node) as reachableNodes
WHERE totalNodes <> reachableNodes
RETURN nodeToRemove as criticalNode

First, You need to identify the type of nodes you have in your graph: if all the nodes are the same type, so you can count all their relationships (or specific relationships); 首先,您需要确定图中的节点类型:如果所有节点都是同一类型,则可以计算它们的所有关系(或特定关系); if a node have more than n relationships (maybe 2) it could be a critical node, else this node not be critical. 如果一个节点具有多于n个关系(可能为2),则它可能是关键节点,否则此节点不是关键节点。

But if you have more than 1 type of nodes, You need to identify which kind of nodes and relationships are more important, then query each kind of nodes and relationships, and finally count their relationships (all relationships or specific relationships) to all the kind of nodes or specific nodes 但是,如果您有不止一种类型的节点,则需要确定哪种类型的节点和关系更重要,然后查询每种类型的节点和关系,最后计算它们与所有类型的关系(所有关系或特定关系)节点或特定节点的

MATCH (n)-[r]->() RETURN COUNT(r)

And if the node is considered like not critical, you can proceed to delete this node. 并且如果该节点被认为不是很关键,则可以继续删除该节点。

I figured out how to do path based filtering where a condition needs to be true on every possible path. 我想出了如何进行基于路径的过滤,其中条件在每个可能的路径上都必须为真。 You can use pattern recognition in a predicate to filter on all paths. 您可以在谓词中使用模式识别来过滤所有路径。 (I use reasonable path length limits for my set to limit runaway. If an alternate path exists in my graph, I expect a short detour. So adjust based on your expectations) (我对我的集合使用了合理的路径长度限制,以限制失控。如果图形中存在替代路径,则我希望绕道会很短。因此,请根据您的期望进行调整)

MATCH (a)-[*..10]->(c)-[*..10]->(b) 
WHERE ALL(p in (a)-[*..20]->(b) WHERE c in NODES(p)) 
RETURN DISTINCT c

I also tried using allShortestPaths like this, but in my example set, the performance was actually worse. 我也尝试过使用allShortestPaths这样,但是在我的示例集中,性能实际上更差。 Your mileage my very. 你的里程很我。

MATCH (a)-[*..10]->(c)-[*..10]->(b) 
WHERE ALL(p in allShortestPaths((a)-[*..20]->(b)) WHERE c in NODES(p)) 
RETURN DISTINCT c

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM