简体   繁体   English

Neo4j Cypher路径在无向图中缓慢

[英]Neo4j Cypher path finding slow in undirected graph

In a graph with 165k nodes and 266k relationships I'd like to run the following Cypher query: 在具有165k节点和266k关系的图形中,我想运行以下Cypher查询:

START n=node:NodeIds('id:firstId'), t=node:NodeIds('id:secondId')   
MATCH (n)-[:RELATIONSHIP_TYPE*1..3]-(t)   
RETURN count(*)

where firstId and secondId is a valid entry for the NodeIds Lucene index. 其中firstIdsecondId是NodeIds Lucene索引的有效条目。

The query takes about 4 seconds to execute from the Neo4j console and I'd like to understand why is it so slow and how it could be made faster. 从Neo4j控制台执行查询大约需要4秒钟,我想了解为什么它这么慢以及如何使其更快。

The index lookup from this takes about 40ms (ie a query just returning the two nodes takes that much) so that can't be the issue. 从中进行索引查找大约需要40毫秒(即,仅返回两个节点的查询会花费大量时间),因此这不会成为问题。

I run Neo4j on a Windows 8 machine with the default settings by starting from Neo4j.bat. 我从Neo4j.bat开始在具有默认设置的Windows 8计算机上运行Neo4j。 I don't think hardware can be an issue as the query only causes a short 10% CPU spike and a barely visible spike in disk usage. 我认为硬件不会成为问题,因为查询只会导致短暂的10%CPU峰值和几乎看不到的磁盘使用率峰值。

BTW the first node has a degree of 40, the second 2 and the result is 1. 顺便说一句,第一个节点的阶数为40,第二个节点的阶数为2。

Any help would be appreciated. 任何帮助,将不胜感激。

Edit 1, memory config: 编辑1,内存配置:

I was running Neo4j with OOTB config by starting from Neo4j.bat with the following defaults regarding memory (if I'm not mistaken and those are the only memory-related configs): 我从Neo4j.bat开始使用OOTB配置运行Neo4j,并使用以下有关内存的默认设置(如果我没有记错,那是唯一与内存相关的配置):

wrapper.java.initmemory=16
wrapper.java.maxmemory=64

neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=50M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.propertystore.db.arrays.mapped_memory=130M

Shooting one into the dark I raised these values to the following: 在黑暗中拍摄一个,我将这些值提高到以下值:

wrapper.java.initmemory=128
wrapper.java.maxmemory=1024

neostore.nodestore.db.mapped_memory=225M
neostore.relationshipstore.db.mapped_memory=250M
neostore.propertystore.db.mapped_memory=290M
neostore.propertystore.db.strings.mapped_memory=330M
neostore.propertystore.db.arrays.mapped_memory=330M

This indeed increased Neo4j memory usage (I mean the memory usage of the java.exe instance running Neo4j) without a good increase in performance (the query takes roughly the same time, with probably a 2-300ms increase occasionally). 这确实增加了Neo4j的内存使用率(我的意思是运行Neo4j的java.exe实例的内存使用率),而性能却没有得到很好的提高(查询时间大致相同,有时可能会增加2-300ms)。 There are GBs of RAM free so there's no hardware constraint. 有GB的可用RAM,因此没有硬件限制。

Edit 2, profiler data: Running the profiler for the query in question yields the following results: 编辑2,探查器数据:运行所查询的探查器将产生以下结果:

neo4j-sh (0)$ profile START n=node:NodeIds('id:4000'), t=node:NodeIds('id:64599') MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t) RETURN count(*);
==> +----------+
==> | count(*) |
==> +----------+
==> | 1        |
==> +----------+
==> 1 row
==> 0 ms
==> 
==> ColumnFilter(symKeys=["  INTERNAL_AGGREGATE-939275295"], returnItemNames=["count(*)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["(  INTERNAL_AGGREGATE-939275295,CountStar)"], _rows=1, _db_hits=0)
==>   ExtractPath(name="path", patterns=["  UNNAMED3=n-[:ASSOCIATIVY_CONNECTION*1..3]-t"], _rows=1, _db_hits=0)
==>     PatternMatch(g="(n)-['  UNNAMED3']-(t)", _rows=1, _db_hits=0)
==>       Nodes(name="t", _rows=1, _db_hits=1)
==>         Nodes(name="n", _rows=1, _db_hits=1)
==>           ParameterPipe(_rows=1, _db_hits=0) 

It says 0ms but I don't know what that is supposed to mean: the result is returned after multiple seconds and the same query executed in the Data Browser's console takes about 3,5s (this is what it displays) and roughly the same amount of time fetched through the RESTful endpoint. 它说的是0毫秒,但我不知道这是什么意思:几秒钟后返回结果,并且在数据浏览器的控制台中执行的同一查询大约需要3,5秒(这就是它所显示的),并且大致相同通过RESTful端点获取的时间。

Edit 3, the real data set: Enough with the theory :-), this is the data set what I'm really talking about: http://associativy.com/Media/Default/Associativy/Wiki.zip It's a graph generated by using the interlinks between Wikipedia articles, created from Wikipedia dump files. 编辑3,真正的数据集:足够的理论:-),这就是我真正在谈论的数据集: http : //associativy.com/Media/Default/Associativy/Wiki.zip这是一个生成的图形通过使用从Wikipedia转储文件创建的Wikipedia文章之间的链接。 It's just the beginning. 这仅仅是个开始。

The real query I'm trying to run is actually the following one, returning the nodes building up the paths between two nodes: 我要运行的实际查询实际上是以下查询,它返回在两个节点之间建立路径的节点:

START n=node:NodeIds('id:4000'), t=node:NodeIds('id:64599')   MATCH path = (n)-[:ASSOCIATIVY_CONNECTION*1..3]-(t)   RETURN nodes(path) AS Nodes

I showed the count query because I wanted the simplest query that shows the symptoms. 我显示了计数查询,因为我想要显示症状的最简单查询。

Edit 4: 编辑4:

I opened another question specifically for the path-returning query. 我专门针对路径返回查询打开了另一个问题

I agree with Wes, this should return in an instant. 我同意Wes的观点,这应该马上恢复。

You upping of the config makes sense, this is in 2 different config files, right? 您升级配置很有意义,这是在2个不同的配置文件中,对吗?

As you are running on windows MMIO is inside the java heap, so I would up this to: 当您在Windows上运行时,MMIO位于Java堆中,因此我将其设置为:

wrapper.java.initmemory=4096 wrapper.java.maxmemory=4096 wrapper.java.initmemory = 4096 wrapper.java.maxmemory = 4096

How long is the returned path? 返回的路径有多长时间? Would it make sense in your domain to specify a direction? 在您的域中指定方向是否有意义?

Can you please run the following (adapt it to the returned path length) 您能否运行以下命令(使其适应返回的路径长度)?

START n=node:NodeIds('id:4000'), 
      t=node:NodeIds('id:64599') 
MATCH path = (n)-[:ASSOCIATIVY_CONNECTION]-(a)
             (a)-[:ASSOCIATIVY_CONNECTION]-(b)-[:ASSOCIATIVY_CONNECTION]-(t) 
RETURN count(*), count(distinct a), count(a), count(distinct b), count(b);

Are you running the 1.9 milestone release? 您是否正在运行1.9里程碑版本? The bidirectional matcher in 1.9 will probably do much better than 1.8.x. 1.9中的双向匹配器可能比1.8.x更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM