[英]Optimize Relationship Building between Many neo4j Nodes
I have a database containing two particular node types: GenomicRange
and GeneModel
. 我有一个包含两个特定节点类型的数据库:
GenomicRange
和GeneModel
。 The GenomicRange
node set contains ~80 million nodes while GeneModel
contains ~45,000 nodes. GenomicRange
节点集包含约8,000万个节点,而GeneModel
包含约45,000个节点。
The GenomicRange
nodes contain a property posStart
which is stored as an integer. GenomicRange
节点包含一个属性posStart
,它存储为整数。 The GeneModel
node contains two particular integer properties geneStart
and geneEnd
. GeneModel
节点包含两个特定的整数属性geneStart
和geneEnd
。 These coordinates are found on a chromosome
property found in both node types (eg 1
through 10
). 这些坐标位于两种节点类型(例如
1
到10
)中的chromosome
属性上。
What I would like to do is to efficiently create relationships (eg [:RANGE_WITHIN]
) between these two nodes if (1) Their chromosome properties match, (2) if the posStart
value in GenomicRange
falls within range of the geneStart
and geneEnd
properties on the GeneModel
node. 我想这样做是为了有效地建立关系(例如
[:RANGE_WITHIN]
这两个节点之间的,若(1)它们的染色体属性相匹配,(2)如果posStart
价值GenomicRange
处于的范围geneStart
和geneEnd
的性质GeneModel
节点。
My problem I am currently having is that my querying/building process is incredibly slow. 我目前遇到的问题是我的查询/构建过程非常缓慢。 Is there a way to optimize this code?
有没有办法优化这段代码?
Thanks for your help! 谢谢你的帮助!
MATCH (model:GeneModel)
WITH model
MATCH (range:GenomicRange)
WHERE range.chromosome = model.chromosome AND range.posStart >= model.geneStart AND range.posStart <= model.geneEnd
CREATE (range)-[:RANGE_WITHIN]->(model)
Few suggestions: 几点建议:
Add index on the properties you are using for comparison. 在您用于比较的属性上添加索引。
Here: posStart, chromosome, geneEnd, geneStart. 这里:posStart,chromosome,geneEnd,geneStart。
`CREATE INDEX ON :GenomicRange(chromosome)`
Increase Heap Memory: Creating index increases memory usage so increase heap size up to 50% of your memory. 增加堆内存:创建索引会增加内存使用量,因此可以将堆大小增加到内存的50%。 You can configure this in
neo4j.conf
file. 您可以在
neo4j.conf
文件中进行配置。
Increase page cache: more the cache size more the data cached in memory, It will help avoid costly disk access. 增加页面缓存:缓存大小越多,缓存在内存中的数据越多,这将有助于避免昂贵的磁盘访问。
Read more about memory configuration here . 阅读更多关于内存配置的信息
PS If you still get out of memory error after increasing heap size, swap GenomicRange
and GeneModel
on line 1 and 3 OR use APOC plugin
to create relationships periodically. PS 如果在增加堆大小后仍然出现内存不足错误,
GeneModel
在第1行和第3行交换GenomicRange
和GeneModel
或使用APOC plugin
定期创建关系。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.