繁体   English   中英

Neo4J遍历内存不足

[英]Neo4J Traversal Running out of memory

我有一个neo4j数据库,其中包含约1.2亿个节点。 我正在使用遍历框架遍历我的图形并计算某些节点的出现。 这就像一个魅力。 不幸的是,在整个数据集上运行代码时,我的内存不足。

我已经为Java VM分配了4gb,我想我提交了事务(使用try-with-resources语句中的tx.success),但是我仍然很快地填满了我的堆。

在下面,您可以找到我的代码:首先,我生成大约40个版本(这些是根节点)。 然后,对于每一个,我都寻找所有相邻的子节点。 对于这些子项(文件)中的每个子项,我检查整个子树中是否存在某个节点。

我当时的理解是

try(Transaction tx){
 }

自动关闭我的交易,但我的堆仍然满。 这使我的查询从第二个或第三个传递版本开始运行缓慢,最终崩溃。 我误会了吗? 还是我还能做些什么?

    Collection<Node> versions;
    Collection<Node> files;
    Collection<Node> nodes;
    try ( Transaction ignored = db.beginTx() )
    {
        versions = IteratorUtil.asCollection(db.traversalDescription().breadthFirst().relationships(ProjectRelations.HASVERSION, Direction.OUTGOING).evaluator(Evaluators.toDepth(1)).evaluator(Evaluators.excludeStartPosition()).traverse(db.getNodeById(0)).nodes());
        ignored.success();
    }

    for(Node v : versions){
        int fors = 0;
        test = 0;

        try( Transaction tx = db.beginTx()){

            files = IteratorUtil.asCollection(db.traversalDescription().breadthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).traverse(v).nodes());

            tx.success();
        }

        for( Node f : files ) {

            try (Transaction t = db.beginTx()){
                int i = 0;
                for(Node node : db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes()){
                     //do some stuff
                }
                t.success();
            }   
        }

        files.clear();


    }
    versions.clear();

更新:

我用迭代器替换了所有内容,例如:

try( 
                Transaction tx = db.beginTx(); 
                ResourceIterator<Node> files = db.traversalDescription().breadthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).traverse(v).nodes().iterator();
            ){



            int idx = 0;
            forloops = 0;
            long start = System.nanoTime();

            while( files.hasNext() ) {

                Node f = files.next();


                try (Transaction t = db.beginTx();
                        ResourceIterator<Node> blah = db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes().iterator();
                        ){
                    int i = 0;

                    while(blah.hasNext()){
                        Node tempNode = blah.next();

                    }
                    blah.close();

                }

            }
            files.close();
        }


    }

问题是,事务将所有内容保留在内存中,直到我耗尽迭代器或close()为止

编辑2:

我使用迭代器来进行所有操作,使用深度优先遍历。 我还将可用的堆内存从4 GB更改为1024mb。 到目前为止,它似乎正在运行(尽管我不确定它是否会完全完成它),尽管运行非常缓慢。 它的最大运行速度约为980mb,但尚未超过该阈值。 由于我的堆在整个时间内都足够好,因此速度确实会大大降低。 有什么想法可以改善吗? 还是这是我最好的?

    try(Transaction tx = db.beginTx()){
    versions = IteratorUtil.asCollection(db
                .traversalDescription()
                .depthFirst()
                .relationships(ProjectRelations.HASVERSION,
                        Direction.OUTGOING)
                .evaluator(Evaluators.toDepth(1))
                .evaluator(Evaluators.excludeStartPosition())
                .traverse(root));

    }
    int mb = 1024 * 1024;
    Runtime runtime = Runtime.getRuntime();

    ResourceIterator<Node> files = null;

    try(Transaction tx = db.beginTx()){
        int idx = 0;
        for(Relationship rel : root.getRelationships(ProjectRelations.HASVERSION, Direction.OUTGOING)){
            idx++;
            System.out.println(idx);
            Node v = rel.getEndNode();
            files = db.traversalDescription().depthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).uniqueness(Uniqueness.NONE).traverse(v).nodes().iterator();
            long start = System.nanoTime();
            while(files.hasNext()){

                Node f = files.next();

                ResourceIterator<Node> node = db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes().iterator();
                while(node.hasNext()){
                    node.next();
                }

            }
            System.out.println("Used Memory:"
                    + (runtime.totalMemory() - runtime.freeMemory()) / mb);
            System.out
                    .println("Total Memory:" + runtime.totalMemory() / mb);


            files.close();
        }

    }

    db.shutdown();

抛出异常:

Exception in thread "GC-Monitor" Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at ch.qos.logback.core.pattern.FormattingConverter.write(FormattingConverter.java:40)
at ch.qos.logback.core.pattern.PatternLayoutBase.writeLoopOnConverters(PatternLayoutBase.java:119)
at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:168)
at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:59)
at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:134)
at ch.qos.logback.core.OutputStreamAppender.writeOut(OutputStreamAppender.java:188)
at ch.qos.logback.core.FileAppender.writeOut(FileAppender.java:206)
at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:212)
at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:272)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:259)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:441)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:395)
at ch.qos.logback.classic.Logger.warn(Logger.java:708)
at org.neo4j.kernel.logging.LogbackService$Slf4jToStringLoggerAdapter.warn(LogbackService.java:240)
at org.neo4j.kernel.impl.cache.MeasureDoNothing.run(MeasureDoNothing.java:84)

java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
at org.neo4j.kernel.impl.core.RelationshipLoader.getMoreRelationships(RelationshipLoader.java:55)
at org.neo4j.kernel.impl.core.NodeManager.getMoreRelationships(NodeManager.java:779)
at org.neo4j.kernel.impl.core.NodeImpl.loadMoreRelationshipsFromNodeManager(NodeImpl.java:577)
at org.neo4j.kernel.impl.core.NodeImpl.getMoreRelationships(NodeImpl.java:466)
at org.neo4j.kernel.impl.core.NodeImpl.loadInitialRelationships(NodeImpl.java:394)
at org.neo4j.kernel.impl.core.NodeImpl.ensureRelationshipMapNotNull(NodeImpl.java:372)
at org.neo4j.kernel.impl.core.NodeImpl.getAllRelationshipsOfType(NodeImpl.java:219)
at org.neo4j.kernel.impl.core.NodeImpl.getRelationships(NodeImpl.java:325)
at org.neo4j.kernel.impl.core.NodeProxy.getRelationships(NodeProxy.java:154)
at org.neo4j.kernel.StandardExpander$RegularExpander.doExpand(StandardExpander.java:583)
at org.neo4j.kernel.StandardExpander$RelationshipExpansion.iterator(StandardExpander.java:195)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationshipsWithoutChecks(TraversalBranchImpl.java:115)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationships(TraversalBranchImpl.java:104)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.initialize(TraversalBranchImpl.java:131)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.next(TraversalBranchImpl.java:151)
at org.neo4j.graphdb.traversal.PreorderDepthFirstSelector.next(PreorderDepthFirstSelector.java:49)
at org.neo4j.kernel.impl.traversal.MonoDirectionalTraverserIterator.fetchNextOrNull(MonoDirectionalTraverserIterator.java:68)
at org.neo4j.kernel.impl.traversal.MonoDirectionalTraverserIterator.fetchNextOrNull(MonoDirectionalTraverserIterator.java:35)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55)
at org.neo4j.kernel.impl.traversal.DefaultTraverser$ResourcePathIterableWrapper$1.fetchNextOrNull(DefaultTraverser.java:140)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55)
at main.QueryExecutor.main(QueryExecutor.java:173)

使用IteratorUtil.asCollection()执行第二遍历时,您似乎急切地消耗了整个迭代器。 我不确定在这种情况下会产生多少个节点,但是如果它们数量很多(即数百万个),很可能会导致内存不足的问题。

我通过将cache_type选项设置为none来解决我的问题。 它不会用完内存并在大约一个小时内完成。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM