Neo4j Java API: GC overhead limit exceeded after running multiple queries

Question

I'm using Neo4j 2.3.0 in Java . I have 16 GB RAM, running the code on MAC OSX Laptop, using " -Xmx12g -Xms12g " as VM arguments.

I've encountered a “GC overhead limit exceeded” problem in Neo4j Java API.

In order to do experiments with lots of queries, I have a program which opens a transaction over different query.db's and get the answers of that from my own framework which is wrapped in an object (It runs a query and print its running time in a file).

So, for running the query, I don't use Cypher .

For each query I open two transactions over a query.db and a data.db, initialize my framework and run it. The memory usage slightly increases and the “GC overhead” finally happens.

try (Transaction txG = knowledgeGraph.beginTx()) {
     try (Transaction txQ = queryGraph.beginTx()) {
          MyObj myFramework = new MyObj();
          printTheResultsIntoTheFile(framework.run());
          myFramework =null;
          txQ.success();
          txQ.close();

These are some of my endeavors to get rid of this error:

After I've used a monitoring program to dump the heap, I've found that there is some problem with this “ org.neo4j.io.pagecache.impl.muninn.MuninnPageCache ” So, I've tried to set the page cache size and limit it to a small value:
dataGraph = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder(MODELGRAPH_DB_PATH) .setConfig(GraphDatabaseSettings.pagecache_memory, "500M").newGraphDatabase();

However, still the "memory leakage” problem exists.

After tx.success() , I called the tx.close() to make sure that it doesn't use the memory.
After using my framework(object) to find the answers of a query, I explicitly set it to null. topkFramework=null;
I called System.gc(); and System.runFinalization();
I changed all of my static variables like MyCacheServer or MyNeighborIndexer to non-static ones and in each query, I made them clear, and explicitly set them to null.
queryNodeIdSet.clear(); queryNodeIdSet = null; queryNodeIdSet = new HashSet<Long>();

Answer 1

After lots of digging into Neo4j, I've found that it's related to create a lot of query graphs one after one. Although I called db.shutdown() after my work with each query, it seems that cache won't be empty.

smallGraph = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder(graphPath)
            .setConfig(GraphDatabaseSettings.pagecache_memory, "240k").newGraphDatabase();

I've added this config and set it to minimum possible amount. Right now the memory leakage is not too much to break my process. After running around 1000 queries it's still running. Earlier it consumed all of my memory (12 GB) after running 200 queries.

This was my stacktrace:

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

    at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.<init>(MuninnPageCache.java:246)

    at org.neo4j.kernel.impl.pagecache.ConfiguringPageCacheFactory.createPageCache(ConfiguringPageCacheFactory.java:96)

    at org.neo4j.kernel.impl.pagecache.ConfiguringPageCacheFactory.getOrCreatePageCache(ConfiguringPageCacheFactory.java:87)

    at org.neo4j.kernel.impl.factory.PlatformModule.createPageCache(PlatformModule.java:277)

    at org.neo4j.kernel.impl.factory.PlatformModule.<init>(PlatformModule.java:154)

    at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.createPlatform(GraphDatabaseFacadeFactory.java:181)

    at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:124)

    at org.neo4j.kernel.impl.factory.CommunityFacadeFactory.newFacade(CommunityFacadeFactory.java:43)

    at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:108)

    at org.neo4j.graphdb.factory.GraphDatabaseFactory.newDatabase(GraphDatabaseFactory.java:129)

    at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:117)

    at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:185)

    at org.neo4j.graphdb.factory.GraphDatabaseFactory.newEmbeddedDatabase(GraphDatabaseFactory.java:79)

    at org.neo4j.graphdb.factory.GraphDatabaseFactory.newEmbeddedDatabase(GraphDatabaseFactory.java:74)

Answer 2

This is a guess (no time to try it now), but I'll give it a go. Neo4j doesn't support nested transactions. Any top-level transaction ( txG in your case) is bound to ThreadLocal . Any "nested" transaction ( txQ ) becomes a PlaceboTransaction . Hence, calling success() or close() on it has no effect whatsoever.

Consequently, everything you access in the child transactions, whilst the top-level one is open, is held in memory (heap) until the top-level transaction is finished. I know these are two different databases, but still, it's ThreadLocal .

I think you should attempt to close the top-level one each time you close the child one as well. See if that helps.

Answer 3

Usually you only use one Neo4j instance per JVM instance.

The off-heap page-cache is unfortunately not released until the JVM shuts down.

And for the heap related sections you will have to make sure that shutdown is called and also null-out references before calling System.gc()

You can just reuse your "smallGraph", clean out the instance, eg with MATCH (n) DETACH DELETE n; And then repopulate it.

Neo4j Java API: GC overhead limit exceeded after running multiple queries

Question

3 answers

solution1
2 2016-03-26 03:27:57

solution2
1 2016-03-09 12:06:51

solution3
1 ACCPTED 2016-03-28 22:38:58

Neo4j Java API: GC overhead limit exceeded after running multiple queries

Question

3 answers

solution1 2 2016-03-26 03:27:57

solution2 1 2016-03-09 12:06:51

solution3 1 ACCPTED 2016-03-28 22:38:58

solution1
2 2016-03-26 03:27:57

solution2
1 2016-03-09 12:06:51

solution3
1 ACCPTED 2016-03-28 22:38:58