简体   繁体   English

根据需要将Neo4J加载到内存中以进行繁重的计算

[英]Load Neo4J in memory on demand for heavy computations

How could I load Neo4J into memory on demand? 如何按需将Neo4J加载到内存中?

On different stages of my long running jobs I'm persisting nodes and relationships to Neo4J. 在我长期工作的不同阶段,我坚持与Neo4J的节点和关系。 So Neo4J should be on disk, since it may consume too much memory and I don't know when I gonna run read queries against it. 所以Neo4J应该在磁盘上,因为它可能消耗过多的内存,而且我不知道何时对它运行读取查询。

But at some point (only once) I will want to run pretty heavy read query against my Neo4J server, and it have very poor performance (hours). 但是在某个时候(只有一次),我将要对Neo4J服务器运行非常繁重的读取查询,并且它的性能(小时)非常差。 As a solution I want to load all Neo4J to RAM for better performance. 作为解决方案,我想将所有Neo4J加载到RAM中以获得更好的性能。

What is the best option for it? 最佳选择是什么? Should I use run disk or there are any better solutions? 我应该使用运行磁盘还是有更好的解决方案?

PS PS

Query with [r:LINK_REL_1*2] works pretty fast, [r:LINK_REL_1*3] works 17 seconds, [r:LINK_REL_1*4] works more than 5 minutes, even do not know how much, since I have 5 minutes timeout. 使用[r:LINK_REL_1*2]工作非常快, [r:LINK_REL_1*3]工作时间为17秒, [r:LINK_REL_1*4]工作时间超过5分钟,甚至不知道多少,因为我有5分钟的超时时间。 But I need [r:LINK_REL_1*2..4] query to perform in reasonable time. 但是我需要[r:LINK_REL_1*2..4]查询才能在合理的时间内执行。

My heavy query explanation 我繁重的查询说明

PROFILE
MATCH path = (start:COLUMN)-[r:LINK_REL_1*2]->(col:COLUMN) 
WHERE start.ENTITY_ID = '385' 
WITH path UNWIND NODES(path) AS col
WITH path, 
COLLECT(DISTINCT col.DATABASE_ID) as distinctDBs
WHERE LENGTH(path) + 1 = SIZE(distinctDBs)
RETURN path

在此处输入图片说明

Updated query with explanation (got the same performance in tests) 更新查询并提供说明(在测试中获得相同的性能)

PROFILE
MATCH (start:COLUMN)
WHERE start.ENTITY_ID = '385' 
MATCH path = (start)-[r:LINK_REL_1*2]->(col:COLUMN)
WITH path, REDUCE(dbs = [], col IN NODES(path) | 
  CASE WHEN col.DATABASE_ID in dbs 
       THEN dbs 
       ELSE dbs + col.DATABASE_ID END) as distinctDbs
WHERE LENGTH(path) + 1 = SIZE(distinctDbs)
RETURN path

在此处输入图片说明

APOC procedures has apoc.warmup.run() , which may get much of Neo4j into cached memory. APOC过程具有apoc.warmup.run() ,这可能会使Neo4j的大部分内容进入缓存内存。 See if that will make a difference. 看看是否会有所作为。

It looks like you're trying to create a query in which the path contains only :Persons from distinct countries. 似乎您正在尝试创建一个查询,其中的路径仅包含来自不同国家/地区的人员。 Is this right? 这是正确的吗?

If so, I think we can find a better query that can do this without hanging. 如果是这样,我认为我们可以找到一个更好的查询而无需挂起就能做到这一点。

First, let's go for low-hanging fruit and see if avoiding the UNWIND can make a difference. 首先,让我们去努力,看看是否避免UNWIND会有所作为。

PROFILE or EXPLAIN the query and see if any numbers look significantly different compared to the original query. 配置文件或解释查询,并查看是否有任何数字与原始查询相比有显着差异。

MATCH (start:PERSON)
WHERE start.ID = '385' 
MATCH path = (start)-[r:FRIENDSHIP_REL*2..5]->(person:PERSON)
WITH path, REDUCE(countries = [], person IN NODES(path) | 
  CASE WHEN person.country in countries 
       THEN countries 
       ELSE countries + person.COUNTRY_ID END) as distinctCountries
WHERE LENGTH(path) + 1 = SIZE(distinctCountries)
RETURN path

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM