简体   繁体   English

如何提高可变长度Neo4j Cypher查询的性能?

[英]How to improve performance on variable length Neo4j Cypher query?

I'm querying Neo4j in a Java Spring Boot application using neo4j-java-driver to connect to the bolt port but my query is taking approximately 30 minutes to return the results. 我正在使用neo4j-java-driver连接到螺栓端口的Java Spring Boot应用程序中查询Neo4j,但我的查询大约需要30分钟才能返回结果。

The query: 查询:

MATCH path=(:JAVA {snapshot: 3})-[*]->()
UNWIND nodes(path) as n
WITH DISTINCT n
SET n.scope = 'JAVA'
RETURN n.ID

I've tried searching online for optimization techniques as well as APOC functions but nothing I've attempted so far is improving the performance. 我尝试过在线搜索优化技术以及APOC函数,但到目前为止,我没有尝试过改善性能。 The labels are indexed. 标签已建立索引。 Snapshot is a property that is present on all nodes and ID is a separate identification that is needed for unrelated reasons. 快照是存在于所有节点上的属性,而ID是出于不相关原因而需要的单独标识。

Graph Information 图形信息

  • 200K nodes 20万个节点
  • 355K Relationships 355K关系
  • 9073 nodes of type JAVA JAVA类型的9073个节点
  • 61K direct relationships outgoing from nodes of type JAVA 从JAVA类型的节点发出的61K直接关系
  • dbms.memory.heap.initial_size=3G dbms.memory.heap.initial_size = 3G
  • dbms.memory.heap.max_size=4G dbms.memory.heap.max_size = 4G
  • dbms.memory.pagecache.size=1G dbms.memory.pagecache.size = 1G

I'm essentially trying to traverse a program call chain where the start of the chain is a node of type JAVA. 我本质上是试图遍历程序调用链,其中链的起点是JAVA类型的节点。 If any other node is reachable from a node of type JAVA then I want to set its scope and return its ID. 如果可以从JAVA类型的节点访问任何其他节点,那么我想设置其范围并返回其ID。 What I think is happening is that the graph is pretty dense with common path traversals and the query is traversing the same path more than once. 我认为正在发生的事情是,该图非常密集,具有常见的路径遍历,并且查询多次遍历同一路径。 I'm not sure I can prevent this or if Neo4j handles that issue internally. 我不确定我是否可以阻止这种情况,或者不确定Neo4j是否在内部处理该问题。

From Java I'm accessing the driver (The driver is instantiated when the application is started) and executing the query and collecting the IDs from the results. 从Java中,我正在访问驱动程序(在启动应用程序时实例化该驱动程序)并执行查询并从结果中收集ID。

try (final Session session = getDriver().session()) {
    session.run(new Statement("<The query>")).stream()
        .map(record -> Long.valueOf(record.get(0).asLong()))
        .collect(Collectors.toList());
...

EDIT, follow up to questions in comments with more data. 编辑,跟进评论中包含更多数据的问题。 Distinct dependencies of nodes with JAVA label. 具有JAVA标签的节点的明显依赖关系。

MATCH (:JAVA {snapshot: 3})-[*]->(n) RETURN count(DISTINCT n)

returns 182,749 返回182,749

Profile of query plan 查询计划简介

查询计划简介

We can certainly test that analysis. 我们当然可以测试该分析。

Keep in mind that your usage of UNWINDing the path nodes is definitely not efficient here, there will be tons of repeats, even if all of the end nodes of the path are distinct, since any nodes present in a subpath will be present in paths extending from that subpath. 请记住,此处使用UNWINDing路径节点绝对不是有效的方法,即使该路径的所有末端节点都是不同的,也会有大量重复,因为子路径中存在的任何节点都将出现在扩展路径中从该子路径。

A better version of your query be: 更好的查询版本是:

MATCH path=(:JAVA {snapshot: 3})-[*]->(n)
WITH DISTINCT n
SET n.scope = 'JAVA'
RETURN n.ID

But if there are multiple paths to the same node (if you examined the PROFILE plan of that query and saw a pretty big gap between the rows after the DISTINCT operation vs before) then this seems like a good case for using APOC path expanders , as we can configure them to use a traversal uniqueness behavior that should only visit any distinct node once throughout all expansions. 但是,如果到同一个节点有多个路径(如果您检查了该查询的PROFILE计划,并且发现DISTINCT操作之后的行与之前的行之间存在很大的差距),那么这似乎是使用APOC路径扩展器的一个好案例,例如我们可以将它们配置为使用遍历唯一性行为,该行为在所有扩展中仅应访问任何一个不同的节点一次。

If your query is getting hung up because it's revisiting the same nodes and paths over and over, then this should be a help. 如果您的查询由于一次又一次地访问相同的节点和路径而被挂断,那么这应该是有帮助的。

Try this: 尝试这个:

MATCH (start:JAVA {snapshot: 3})
CALL apoc.path.subgraphNodes(start, {relationshipFilter:'>'}) YIELD node as n
WITH n
SKIP 1 // so we don't apply this to the start node
SET n.scope = 'JAVA'
RETURN n.ID

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM