简体   繁体   English

Neo4j在创建节点/关系时嵌入到内存之外

[英]Neo4j embedded out of memory at node/relationship creation

I've been investigating Neo4j for a bioinformatics question. 我一直在研究Neo4j的生物信息学问题。 I created around 20000 nodes. 我创建了约20000个节点。 These nodes should be related to about 100 nodes each. 这些节点应分别与大约100个节点相关。

I wanted to use the Java core API with an embedded Neo4j database as described in the [Java tutorial] ( http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded-hello-world.html ) 我想将Java核心API与嵌入式Java 4结合使用,如[Java教程]中所述( http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded-hello-world.html

I have first to query the database to get existing nodes before adding derived ones and relationships. 在添加派生节点和关系之前,我首先要查询数据库以获取现有节点。

I quickly run into excessive memory consumption. 我很快遇到了过多的内存消耗。 I here enclose a Java method which makes Neo4J crash. 我在这里附上一个使Neo4J崩溃的Java方法。 Please, could you give me a tip on how to solve this memory issue. 请,您能给我一些有关如何解决此内存问题的提示。 What would be the best practices to solve this kind of situation? 解决这种情况的最佳实践是什么?

I attach memory usage graphs (snapshots from VisualVM) to illustrate memory usage 我附上了内存使用情况图(VisualVM的快照)以说明内存使用情况 在此处输入图片说明 , 在此处输入图片说明 , 在此处输入图片说明 .

configuration: 组态:

  Platform : Windows-7 win32,  java-1.7.0_51 (Program arguments -Xms512m -Xmx1024m)
  neo4j.properties
    use_memory_mapped_buffers=true
    neostore.nodestore.db.mapped_memory=100M
    neostore.relationshipstore.db.mapped_memory=150M
    neostore.propertystore.db.mapped_memory=150M
    neostore.propertystore.db.strings.mapped_memory=150M
    neostore.propertystore.db.arrays.mapped_memory=150M

  neo4j-wrapper.conf
    wrapper.java.additional=-XX:+UseConcMarkSweepGC
    wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
    wrapper.java.initmemory=512
    wrapper.java.maxmemory=1024

Thanks in advance, Best regards 在此先感谢您,最好的问候

Code : the limit of the value varies, the average should be around 100. 代码:值的上限有所不同,平均值应在100左右。

static void stackoverflowNativeAPIMemoryIssue() {
    String DB_PATH = "C:/neo4j/Neo4j-2.1.2/data/graph.db";
    GraphDatabaseService db = new GraphDatabaseFactory()
        .newEmbeddedDatabase(DB_PATH);        
    // *** query
    String query = "match (n:ExistingNode) return n;";            
    ExecutionEngine engine = new ExecutionEngine(db);        
    ExecutionResult result;        
    Label labelFrom = DynamicLabel.label("From");        
    result = engine.execute(query);        
    Iterator<Node> n_column = result.columnAs("n");
    Node nodeFrom = null;
    Relationship relationship = null;        
    int count = 0;        
    int i = 0;
    for (Node nodeTo : IteratorUtil.asIterable(n_column)) {
      // loop which makes the code break!
      //for (i = 0; i < 5; i++) {
        try (Transaction tx = db.beginTx()) {
          ++count;
          nodeFrom = db.createNode(labelFrom);
          nodeFrom.setProperty("name", "name-" + count + "-" + i);

          relationship = nodeFrom.createRelationshipTo(nodeTo,
              Relation.MY_RELATION);
          relationship.setProperty("name", "relation-" + count
              + "- " + i);
          tx.success();
        }    
      //}
    }
    db.shutdown();
    }

no loop : program runs until the end... 无循环:程序运行到最后...

loop 5 -> memory expands, but process terminates OK. 循环5->内存扩展,但进程终止。

loop 10 times -> out of memory no node, no relationship created although transaction should be triggered on each node and relationship creation. 循环10次->内存不足,没有节点,没有创建关系,尽管应该在每个节点上触发事务并创建关系。

Exception in thread "GC-Monitor" Exception in thread "main" java.lang.OutOfMemoryError: 
Java heap space
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at org.neo4j.kernel.impl.cache.MeasureDoNothing.run(MeasureDoNothing.java:84)
java.lang.OutOfMemoryError: Java heap space
at org.neo4j.kernel.impl.util.VersionedHashMap.put(VersionedHashMap.java:185)
at java.util.Collections$SetFromMap.add(Unknown Source)
at org.neo4j.kernel.impl.util.DiffSets.add(DiffSets.java:100)
at org.neo4j.kernel.impl.api.state.TxStateImpl.nodeDoCreate(TxStateImpl.java:363)
at org.neo4j.kernel.impl.api.StateHandlingStatementOperations.nodeCreate(StateHandlingStatementOperations.java:101)
at org.neo4j.kernel.impl.api.ConstraintEnforcingEntityOperations.nodeCreate(ConstraintEnforcingEntityOperations.java:390)
at org.neo4j.kernel.impl.api.LockingStatementOperations.nodeCreate(LockingStatementOperations.java:208)
at org.neo4j.kernel.impl.api.OperationsFacade.nodeCreate(OperationsFacade.java:500)
at org.neo4j.kernel.InternalAbstractGraphDatabase.createNode(InternalAbstractGraphDatabase.java:1125)

I've encountered a similar issue with a program that was running a very long transaction. 我在运行非常长的事务的程序中遇到了类似的问题。

My program was basically parsing a big CSV file, line by line, and injected nodes and relationships for each line it parsed. 我的程序基本上是逐行解析一个大CSV文件,并为解析的每一行注入节点和关系。 This big while loop was enclosed in a Transaction block. 这个大的while循环包含在Transaction块中。

I had a memory leak just like the one you described. 就像您描述的那样,我发生了内存泄漏。

What I discovered though, with VisualVM is that when that while loop was over, the memory heap size droped massively. 但是,我发现使用VisualVM的是,当while循环结束时,内存堆大小大幅下降。 So I wondered: "What are the objects that live for ever in that while loop?" 所以我想知道:“在while循环中永远存在的对象是什么?” The answer was the Transaction object. 答案是交易对象。

So I patched my program to create a Transaction for each iteration of the file parsing loop, and while this decreases the performances of the parsing, it solved the memory leak, the heap size is now stable after some iterations are ran. 因此,我修补了程序,为文件解析循环的每次迭代创建了一个事务,这虽然降低了解析性能,但解​​决了内存泄漏,但是在运行了某些迭代之后,堆大小现在是稳定的。

I hope that helps. 希望对您有所帮助。

If a Neo4j expert could shed some light on why the Transaction leaks, that would be very much appreciated. 如果Neo4j专家可以阐明交易泄漏的原因,将不胜感激。

Be aware that on Windows platform the mapped_memory is part of JVM heap. 请注意,在Windows平台上,mapped_memory是JVM堆的一部分。 You have in total 700M assigned to mapped memory and max heap size of 1G - leaving to less memory for the rest. 您总共分配了700M的映射内存,最大堆大小为1G-剩下的内存更少了。

Either increase max heap or shrink mapped memory. 增加最大堆或缩小映射的内存。

There is a wider problem with your code above. 上面的代码存在一个更广泛的问题。

result = engine.execute(query); 结果= engine.execute(查询);

should itself be in a transaction, at least conceptually. 至少在概念上应该自己进行交易。 It returns a lazily evaluated iterator, and every call to this iterator requires a read-lock on the node it returns. 它返回延迟评估的迭代器,对该迭代器的每次调用都需要在其返回的节点上进行读锁定。 Thus, you are effectively attempting to pass nodes which are the result of one open transaction, to a second transaction which edits them. 因此,您实际上是在尝试将作为一个未结事务的结果的节点传递给对其进行编辑的第二个事务。

Suppose in the middle of your code, after your generate your iterator, I made a third transaction which deleted all your nodes, what would happen then? 假设在代码中间,在生成迭代器之后,我进行了第三个事务,删除了所有节点,那么会发生什么?

Essentially, you shouldn't ever be trying to pass node objects from one transaction to another. 本质上,您不应该尝试将节点对象从一个事务传递到另一个事务。 Neo4j is a server, and cannot assume that no other user will edit those users after your transaction is closed, but before your second transaction opens. Neo4j是一台服务器,不能假定在交易关闭后但在第二笔交易打开之前,没有其他用户可以编辑这些用户。

I suspect that on the server side a whole panalopy of deadlock resolution routines are kicking into action to deal with the fact that you have multiple open transactions on the same node objects, at least one of which is a write transaction. 我怀疑在服务器端,整个死锁解决例程都在起作用,以处理您在同一节点对象上有多个打开的事务的事实,其中至少有一个是写事务。 This is likely responsible for your perceived leak. 这可能是您感觉到的泄漏的原因。

Try just putting your execution engine code in the transaction, and then iteration over the loops in a single transaction. 尝试仅将执行引擎代码放入事务中,然后在单个事务中遍历循环。 Creating a few thousand entities in a single transaction is totally fine, and has minimal heap space overhead. 在单个事务中创建几千个实体完全可以,并且具有最小的堆空间开销。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM