简体   繁体   English

优化许多neo4j节点之间的关系建立

[英]Optimize Relationship Building between Many neo4j Nodes

I have a database containing two particular node types: GenomicRange and GeneModel . 我有一个包含两个特定节点类型的数据库: GenomicRangeGeneModel The GenomicRange node set contains ~80 million nodes while GeneModel contains ~45,000 nodes. GenomicRange节点集包含约8,000万个节点,而GeneModel包含约45,000个节点。

The GenomicRange nodes contain a property posStart which is stored as an integer. GenomicRange节点包含一个属性posStart ,它存储为整数。 The GeneModel node contains two particular integer properties geneStart and geneEnd . GeneModel节点包含两个特定的整数属性geneStartgeneEnd These coordinates are found on a chromosome property found in both node types (eg 1 through 10 ). 这些坐标位于两种节点类型(例如110 )中的chromosome属性上。

What I would like to do is to efficiently create relationships (eg [:RANGE_WITHIN] ) between these two nodes if (1) Their chromosome properties match, (2) if the posStart value in GenomicRange falls within range of the geneStart and geneEnd properties on the GeneModel node. 我想这样做是为了有效地建立关系(例如[:RANGE_WITHIN]这两个节点之间的,若(1)它们的染色体属性相匹配,(2)如果posStart价值GenomicRange处于的范围geneStartgeneEnd的性质GeneModel节点。

My problem I am currently having is that my querying/building process is incredibly slow. 我目前遇到的问题是我的查询/构建过程非常缓慢。 Is there a way to optimize this code? 有没有办法优化这段代码?

Thanks for your help! 谢谢你的帮助!

MATCH (model:GeneModel)
WITH model
MATCH (range:GenomicRange)
WHERE range.chromosome = model.chromosome AND range.posStart >= model.geneStart AND range.posStart <= model.geneEnd
CREATE (range)-[:RANGE_WITHIN]->(model)

Few suggestions: 几点建议:

Add index on the properties you are using for comparison. 在您用于比较的属性上添加索引。

Here: posStart, chromosome, geneEnd, geneStart. 这里:posStart,chromosome,geneEnd,geneStart。

`CREATE INDEX ON :GenomicRange(chromosome)`

Increase Heap Memory: Creating index increases memory usage so increase heap size up to 50% of your memory. 增加堆内存:创建索引会增加内存使用量,因此可以将堆大小增加到内存的50%。 You can configure this in neo4j.conf file. 您可以在neo4j.conf文件中进行配置。

Increase page cache: more the cache size more the data cached in memory, It will help avoid costly disk access. 增加页面缓存:缓存大小越多,缓存在内存中的数据越多,这将有助于避免昂贵的磁盘访问。

Read more about memory configuration here . 阅读更多关于内存配置的信息

PS If you still get out of memory error after increasing heap size, swap GenomicRange and GeneModel on line 1 and 3 OR use APOC plugin to create relationships periodically. PS 如果在增加堆大小后仍然出现内存不足错误, GeneModel在第1行和第3行交换GenomicRangeGeneModel或使用APOC plugin定期创建关系。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM