[英]Neo4J Traversal Running out of memory
我有一个neo4j数据库,其中包含约1.2亿个节点。 我正在使用遍历框架遍历我的图形并计算某些节点的出现。 这就像一个魅力。 不幸的是,在整个数据集上运行代码时,我的内存不足。
我已经为Java VM分配了4gb,我想我提交了事务(使用try-with-resources语句中的tx.success),但是我仍然很快地填满了我的堆。
在下面,您可以找到我的代码:首先,我生成大约40个版本(这些是根节点)。 然后,对于每一个,我都寻找所有相邻的子节点。 对于这些子项(文件)中的每个子项,我检查整个子树中是否存在某个节点。
我当时的理解是
try(Transaction tx){
}
自动关闭我的交易,但我的堆仍然满。 这使我的查询从第二个或第三个传递版本开始运行缓慢,最终崩溃。 我误会了吗? 还是我还能做些什么?
Collection<Node> versions;
Collection<Node> files;
Collection<Node> nodes;
try ( Transaction ignored = db.beginTx() )
{
versions = IteratorUtil.asCollection(db.traversalDescription().breadthFirst().relationships(ProjectRelations.HASVERSION, Direction.OUTGOING).evaluator(Evaluators.toDepth(1)).evaluator(Evaluators.excludeStartPosition()).traverse(db.getNodeById(0)).nodes());
ignored.success();
}
for(Node v : versions){
int fors = 0;
test = 0;
try( Transaction tx = db.beginTx()){
files = IteratorUtil.asCollection(db.traversalDescription().breadthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).traverse(v).nodes());
tx.success();
}
for( Node f : files ) {
try (Transaction t = db.beginTx()){
int i = 0;
for(Node node : db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes()){
//do some stuff
}
t.success();
}
}
files.clear();
}
versions.clear();
更新:
我用迭代器替换了所有内容,例如:
try(
Transaction tx = db.beginTx();
ResourceIterator<Node> files = db.traversalDescription().breadthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).traverse(v).nodes().iterator();
){
int idx = 0;
forloops = 0;
long start = System.nanoTime();
while( files.hasNext() ) {
Node f = files.next();
try (Transaction t = db.beginTx();
ResourceIterator<Node> blah = db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes().iterator();
){
int i = 0;
while(blah.hasNext()){
Node tempNode = blah.next();
}
blah.close();
}
}
files.close();
}
}
问题是,事务将所有内容保留在内存中,直到我耗尽迭代器或close()为止
编辑2:
我使用迭代器来进行所有操作,使用深度优先遍历。 我还将可用的堆内存从4 GB更改为1024mb。 到目前为止,它似乎正在运行(尽管我不确定它是否会完全完成它),尽管运行非常缓慢。 它的最大运行速度约为980mb,但尚未超过该阈值。 由于我的堆在整个时间内都足够好,因此速度确实会大大降低。 有什么想法可以改善吗? 还是这是我最好的?
try(Transaction tx = db.beginTx()){
versions = IteratorUtil.asCollection(db
.traversalDescription()
.depthFirst()
.relationships(ProjectRelations.HASVERSION,
Direction.OUTGOING)
.evaluator(Evaluators.toDepth(1))
.evaluator(Evaluators.excludeStartPosition())
.traverse(root));
}
int mb = 1024 * 1024;
Runtime runtime = Runtime.getRuntime();
ResourceIterator<Node> files = null;
try(Transaction tx = db.beginTx()){
int idx = 0;
for(Relationship rel : root.getRelationships(ProjectRelations.HASVERSION, Direction.OUTGOING)){
idx++;
System.out.println(idx);
Node v = rel.getEndNode();
files = db.traversalDescription().depthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).uniqueness(Uniqueness.NONE).traverse(v).nodes().iterator();
long start = System.nanoTime();
while(files.hasNext()){
Node f = files.next();
ResourceIterator<Node> node = db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes().iterator();
while(node.hasNext()){
node.next();
}
}
System.out.println("Used Memory:"
+ (runtime.totalMemory() - runtime.freeMemory()) / mb);
System.out
.println("Total Memory:" + runtime.totalMemory() / mb);
files.close();
}
}
db.shutdown();
抛出异常:
Exception in thread "GC-Monitor" Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at ch.qos.logback.core.pattern.FormattingConverter.write(FormattingConverter.java:40)
at ch.qos.logback.core.pattern.PatternLayoutBase.writeLoopOnConverters(PatternLayoutBase.java:119)
at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:168)
at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:59)
at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:134)
at ch.qos.logback.core.OutputStreamAppender.writeOut(OutputStreamAppender.java:188)
at ch.qos.logback.core.FileAppender.writeOut(FileAppender.java:206)
at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:212)
at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:272)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:259)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:441)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:395)
at ch.qos.logback.classic.Logger.warn(Logger.java:708)
at org.neo4j.kernel.logging.LogbackService$Slf4jToStringLoggerAdapter.warn(LogbackService.java:240)
at org.neo4j.kernel.impl.cache.MeasureDoNothing.run(MeasureDoNothing.java:84)
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
at org.neo4j.kernel.impl.core.RelationshipLoader.getMoreRelationships(RelationshipLoader.java:55)
at org.neo4j.kernel.impl.core.NodeManager.getMoreRelationships(NodeManager.java:779)
at org.neo4j.kernel.impl.core.NodeImpl.loadMoreRelationshipsFromNodeManager(NodeImpl.java:577)
at org.neo4j.kernel.impl.core.NodeImpl.getMoreRelationships(NodeImpl.java:466)
at org.neo4j.kernel.impl.core.NodeImpl.loadInitialRelationships(NodeImpl.java:394)
at org.neo4j.kernel.impl.core.NodeImpl.ensureRelationshipMapNotNull(NodeImpl.java:372)
at org.neo4j.kernel.impl.core.NodeImpl.getAllRelationshipsOfType(NodeImpl.java:219)
at org.neo4j.kernel.impl.core.NodeImpl.getRelationships(NodeImpl.java:325)
at org.neo4j.kernel.impl.core.NodeProxy.getRelationships(NodeProxy.java:154)
at org.neo4j.kernel.StandardExpander$RegularExpander.doExpand(StandardExpander.java:583)
at org.neo4j.kernel.StandardExpander$RelationshipExpansion.iterator(StandardExpander.java:195)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationshipsWithoutChecks(TraversalBranchImpl.java:115)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationships(TraversalBranchImpl.java:104)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.initialize(TraversalBranchImpl.java:131)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.next(TraversalBranchImpl.java:151)
at org.neo4j.graphdb.traversal.PreorderDepthFirstSelector.next(PreorderDepthFirstSelector.java:49)
at org.neo4j.kernel.impl.traversal.MonoDirectionalTraverserIterator.fetchNextOrNull(MonoDirectionalTraverserIterator.java:68)
at org.neo4j.kernel.impl.traversal.MonoDirectionalTraverserIterator.fetchNextOrNull(MonoDirectionalTraverserIterator.java:35)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55)
at org.neo4j.kernel.impl.traversal.DefaultTraverser$ResourcePathIterableWrapper$1.fetchNextOrNull(DefaultTraverser.java:140)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55)
at main.QueryExecutor.main(QueryExecutor.java:173)
使用IteratorUtil.asCollection()执行第二遍历时,您似乎急切地消耗了整个迭代器。 我不确定在这种情况下会产生多少个节点,但是如果它们数量很多(即数百万个),很可能会导致内存不足的问题。
我通过将cache_type
选项设置为none
来解决我的问题。 它不会用完内存并在大约一个小时内完成。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.