简体   繁体   English

Neo4j性能与周期

[英]Neo4j performance with cycles

I have a relatively large neo4j graph with 7 millions vertices and 5 millions of relations. 我有一个相对较大的neo4j图,具有700万个顶点和500万个关系。

When I try to find out subtree size for one node neo4j is stuck in traversing 600,000 nodes, only 130 of whom are unique. 当我尝试找出一个节点的子树大小时,neo4j被卡在遍历600,000个节点中,其中只有130个是唯一的。 It does it because of cycles. 这样做是因为周期。 Looks like it applies distinct only after it traverses the whole graph to maximum depth. 看起来只有在将整个图形遍历到最大深度后,它才适用于distinct

Is it possible to change this behaviour somehow? 是否有可能以某种方式更改此行为?

The query is: 查询是:

match (a1)-[o1*1..]->(a2) WHERE a1.id = '123' RETURN distinct a2

You can iteratively step through the subgraph a "layer" at a time while avoiding reprocessing the same node multiple times, by using the APOC procedure apoc.periodic.commit . 通过使用APOC过程apoc.periodic.commit ,您可以一次迭代地遍历子图的“层”,同时避免多次重新处理同一节点。 That procedure iteratively processes a query until it returns 0. 该过程将迭代处理查询,直到返回0。

Here is a example of this technique. 这是此技术的示例。 It: 它:

  • Uses a temporary TempNode node to keep track of a couple of important values between iterations, one of which will eventually contain the disinct ids of the nodes in the subgraph (except for the "root" node's id, since your question's query also leaves that out). 使用临时的TempNode节点跟踪两次迭代之间的几个重要值,其中一个最终将包含子图中各节点的不同ID(“根”节点的ID除外,因为问题的查询也将其排除在外) )。
  • Assumes that all the nodes you care about share the same label, Foo , and that you have an index on Foo(id) . 假设您关心的所有节点都共享相同的标签Foo ,并且您在Foo(id)上有一个索引。 This is for speeding up the MATCH operations, and is not strictly necessary. 这是为了加快MATCH操作的速度,并非绝对必要。

Step 1: Create TempNode (using MERGE, to reuse existing node, if any) 步骤1:创建TempNode(使用MERGE,以重用现有节点,如果有的话)

WITH '123' AS rootId
MERGE (temp:TempNode)
SET temp.allIds = [rootId], temp.layerIds = [rootId];

Step 2: Perform iterations (to get all subgraph nodes) 步骤2:执行迭代(以获取所有子图节点)

CALL apoc.periodic.commit("
  MATCH (temp:TempNode)
  UNWIND temp.layerIds AS id
  MATCH (n:Foo) WHERE n.id = id
  OPTIONAL MATCH (n)-->(next)
  WHERE NOT next.id IN temp.allIds
  WITH temp, COLLECT(DISTINCT next.id) AS layerIds
  SET temp.allIds = temp.allIds + layerIds, temp.layerIds = layerIds
  RETURN SIZE(layerIds);
");

Step 3: Use subgraph ids 步骤3:使用子图ID

MATCH (temp:TempNode)
// ... use temp.allIds, which contains the distinct ids in the subgraph ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM