简体   繁体   中英

Neo4j embedded out of memory at node/relationship creation

I've been investigating Neo4j for a bioinformatics question. I created around 20000 nodes. These nodes should be related to about 100 nodes each.

I wanted to use the Java core API with an embedded Neo4j database as described in the [Java tutorial] ( http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded-hello-world.html )

I have first to query the database to get existing nodes before adding derived ones and relationships.

I quickly run into excessive memory consumption. I here enclose a Java method which makes Neo4J crash. Please, could you give me a tip on how to solve this memory issue. What would be the best practices to solve this kind of situation?

I attach memory usage graphs (snapshots from VisualVM) to illustrate memory usage 在此处输入图片说明 , 在此处输入图片说明 , 在此处输入图片说明 .

configuration:

  Platform : Windows-7 win32,  java-1.7.0_51 (Program arguments -Xms512m -Xmx1024m)
  neo4j.properties
    use_memory_mapped_buffers=true
    neostore.nodestore.db.mapped_memory=100M
    neostore.relationshipstore.db.mapped_memory=150M
    neostore.propertystore.db.mapped_memory=150M
    neostore.propertystore.db.strings.mapped_memory=150M
    neostore.propertystore.db.arrays.mapped_memory=150M

  neo4j-wrapper.conf
    wrapper.java.additional=-XX:+UseConcMarkSweepGC
    wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
    wrapper.java.initmemory=512
    wrapper.java.maxmemory=1024

Thanks in advance, Best regards

Code : the limit of the value varies, the average should be around 100.

static void stackoverflowNativeAPIMemoryIssue() {
    String DB_PATH = "C:/neo4j/Neo4j-2.1.2/data/graph.db";
    GraphDatabaseService db = new GraphDatabaseFactory()
        .newEmbeddedDatabase(DB_PATH);        
    // *** query
    String query = "match (n:ExistingNode) return n;";            
    ExecutionEngine engine = new ExecutionEngine(db);        
    ExecutionResult result;        
    Label labelFrom = DynamicLabel.label("From");        
    result = engine.execute(query);        
    Iterator<Node> n_column = result.columnAs("n");
    Node nodeFrom = null;
    Relationship relationship = null;        
    int count = 0;        
    int i = 0;
    for (Node nodeTo : IteratorUtil.asIterable(n_column)) {
      // loop which makes the code break!
      //for (i = 0; i < 5; i++) {
        try (Transaction tx = db.beginTx()) {
          ++count;
          nodeFrom = db.createNode(labelFrom);
          nodeFrom.setProperty("name", "name-" + count + "-" + i);

          relationship = nodeFrom.createRelationshipTo(nodeTo,
              Relation.MY_RELATION);
          relationship.setProperty("name", "relation-" + count
              + "- " + i);
          tx.success();
        }    
      //}
    }
    db.shutdown();
    }

no loop : program runs until the end...

loop 5 -> memory expands, but process terminates OK.

loop 10 times -> out of memory no node, no relationship created although transaction should be triggered on each node and relationship creation.

Exception in thread "GC-Monitor" Exception in thread "main" java.lang.OutOfMemoryError: 
Java heap space
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at org.neo4j.kernel.impl.cache.MeasureDoNothing.run(MeasureDoNothing.java:84)
java.lang.OutOfMemoryError: Java heap space
at org.neo4j.kernel.impl.util.VersionedHashMap.put(VersionedHashMap.java:185)
at java.util.Collections$SetFromMap.add(Unknown Source)
at org.neo4j.kernel.impl.util.DiffSets.add(DiffSets.java:100)
at org.neo4j.kernel.impl.api.state.TxStateImpl.nodeDoCreate(TxStateImpl.java:363)
at org.neo4j.kernel.impl.api.StateHandlingStatementOperations.nodeCreate(StateHandlingStatementOperations.java:101)
at org.neo4j.kernel.impl.api.ConstraintEnforcingEntityOperations.nodeCreate(ConstraintEnforcingEntityOperations.java:390)
at org.neo4j.kernel.impl.api.LockingStatementOperations.nodeCreate(LockingStatementOperations.java:208)
at org.neo4j.kernel.impl.api.OperationsFacade.nodeCreate(OperationsFacade.java:500)
at org.neo4j.kernel.InternalAbstractGraphDatabase.createNode(InternalAbstractGraphDatabase.java:1125)

I've encountered a similar issue with a program that was running a very long transaction.

My program was basically parsing a big CSV file, line by line, and injected nodes and relationships for each line it parsed. This big while loop was enclosed in a Transaction block.

I had a memory leak just like the one you described.

What I discovered though, with VisualVM is that when that while loop was over, the memory heap size droped massively. So I wondered: "What are the objects that live for ever in that while loop?" The answer was the Transaction object.

So I patched my program to create a Transaction for each iteration of the file parsing loop, and while this decreases the performances of the parsing, it solved the memory leak, the heap size is now stable after some iterations are ran.

I hope that helps.

If a Neo4j expert could shed some light on why the Transaction leaks, that would be very much appreciated.

Be aware that on Windows platform the mapped_memory is part of JVM heap. You have in total 700M assigned to mapped memory and max heap size of 1G - leaving to less memory for the rest.

Either increase max heap or shrink mapped memory.

There is a wider problem with your code above.

result = engine.execute(query);

should itself be in a transaction, at least conceptually. It returns a lazily evaluated iterator, and every call to this iterator requires a read-lock on the node it returns. Thus, you are effectively attempting to pass nodes which are the result of one open transaction, to a second transaction which edits them.

Suppose in the middle of your code, after your generate your iterator, I made a third transaction which deleted all your nodes, what would happen then?

Essentially, you shouldn't ever be trying to pass node objects from one transaction to another. Neo4j is a server, and cannot assume that no other user will edit those users after your transaction is closed, but before your second transaction opens.

I suspect that on the server side a whole panalopy of deadlock resolution routines are kicking into action to deal with the fact that you have multiple open transactions on the same node objects, at least one of which is a write transaction. This is likely responsible for your perceived leak.

Try just putting your execution engine code in the transaction, and then iteration over the loops in a single transaction. Creating a few thousand entities in a single transaction is totally fine, and has minimal heap space overhead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM