[英]Neo4j import tool - OutOfMemory error: GC overhead limit exceeded
我正在使用neo4j-import工具(Windows)導入具有約2000萬個關系的約100萬個節點,所有這些節點都應該是唯一的。 該過程順利進行,直到到達“ Relationship Count”任務為止,該過程一直加載到20M(看似所有關系),但隨后掛起了一段時間(30分鍾至1小時),最終返回了“ java”。 lang.OutOfMemoryError:超出了GC開銷限制”。
我之前已經成功加載了大型圖形數據庫(39M個節點,21M個關系),所以我不確定問題出在哪里。 是因為與我加載的以前的數據庫相比,圖形數據庫的連接更加緊密嗎?
還是會發生內存泄漏? 在我的任務管理器中,Java平台SE Binary進程在導入負載時(尤其是在即將結束時)需要越來越大的內存量(在16GB RAM中高達12-13GB)。 這似乎非常大,特別是因為39M節點/ 21M關系圖數據庫能夠使用導入工具相對快速地成功導入(沒有掛在關系計數上)。
關於可能出了什么問題的任何想法? 提前致謝!
如果它有助於查看我的節點/關系文件,請使用以下鏈接: https : //drive.google.com/open?id=0Bw7N-SlJA3ZCei0ycEhoa2YwNUU
這是neo4j shell輸出:
C:Users\Username\Documents\Neo4j>neo4jImport -into graphDB1.graphdb --nodes D:\concept.csv --relationships D:\predicate.csv --stacktrace --idtype integer
WARNING! This batch script has been deprecated. Please use the provided PowerShell scripts instead: http://neo4j.com/docs/stable/powershell.html
The system cannot find the path specified.
Importing the contents of these files into graphDB1.graphdb:
Nodes:
D:\concept.csv
Relationships:
D:\predicate.csv
Available memory:
Free machine memory: 13.50 GB
Max heap memory : 12.75 GB
Nodes
[>:|PR|NOD|*LABEL SCAN---------------------------------|v:6.79 MB/s----------------------------] 1M
Done in 40s 562ms
Prepare node index
[*DETECT:20.37 MB------------------------------------------------------------------------------] 1M
Done in 802ms
Calculate dense nodes
[*>:59.38 MB/s----------------------------------|PREPARE(3)====================================] 20M
Done in 12s 566ms
Relationships
[>:2.01 |PREPARE-----------|P|RELATIONSHI|*v:4.05 MB/s-----------------------------------------] 20M
Done in 6m 3s 655ms
Node --> Relationship
[>:3.19 MB/s--------------------------|L|*v:2.39 MB/s------------------------------------------] 1M
Done in 8s 421ms
Relationship --> Relationship
[*>:6.82 MB/s--------------------------------------|LINK-----------|v:6.82 MB/s----------------] 20M
Done in 1m 36s 849ms
Node counts
[*COUNT:91.55 MB-------------------------------------------------------------------------------] 1M
Done in 3m 35s 21ms
Relationship counts
[*>:8.62 MB/s-----------------------------------------------------------|COUNT-----------------] 20MException in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Unknown Source)
at java.util.ArrayList.toArray(Unknown Source)
at java.util.ArrayList.<init>(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.stats.StepStats.<init>(StepStats.java:39)
at org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.stats(AbstractStep.java:220)
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution$1.compare(StageExecution.java:123)
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution$1.compare(StageExecution.java:118)
at java.util.TimSort.countRunAndMakeAscending(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.Arrays.sort(Unknown Source)
at java.util.Collections.sort(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stepsOrderedBy(StageExecution.java:117)
at org.neo4j.unsafe.impl.batchimport.staging.DynamicProcessorAssigner.assignProcessorsToPotentialBottleNeck(DynamicProcessorAssigner.java:94)
at org.neo4j.unsafe.impl.batchimport.staging.DynamicProcessorAssigner.check(DynamicProcessorAssigner.java:81)
at org.neo4j.unsafe.impl.batchimport.staging.MultiExecutionMonitor.check(MultiExecutionMonitor.java:106)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:65)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseExecution(ExecutionSupervisors.java:80)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:224)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:185)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:363)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:279)
更新1:
這是導入在關系計數時掛起的那一刻的線程轉儲:
2016-02-17 08:28:12
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):
"MuninnPageCache[1]-FlushTask" daemon prio=6 tid=0x0000000026855800 nid=0xfe0 waiting on condition [0x00000000288fe000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000004c0189810> (a org.neo4j.io.pagecache.impl.muninn.MuninnPageCache)
at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.continuouslyFlushPages(MuninnPageCache.java:909)
at org.neo4j.io.pagecache.impl.muninn.FlushTask.run(FlushTask.java:36)
at org.neo4j.io.pagecache.impl.muninn.BackgroundTask.run(BackgroundTask.java:45)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
"MuninnPageCache[1]-EvictionTask" daemon prio=6 tid=0x0000000026904000 nid=0x3bd4 runnable [0x00000000287fe000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000004c0189810> (a org.neo4j.io.pagecache.impl.muninn.MuninnPageCache)
at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.parkEvictor(MuninnPageCache.java:697)
at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.parkUntilEvictionRequired(MuninnPageCache.java:751)
at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.continuouslySweepPages(MuninnPageCache.java:732)
at org.neo4j.io.pagecache.impl.muninn.EvictionTask.run(EvictionTask.java:39)
at org.neo4j.io.pagecache.impl.muninn.BackgroundTask.run(BackgroundTask.java:45)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
"Service Thread" daemon prio=6 tid=0x0000000024ee8000 nid=0x301c runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x0000000024ee6000 nid=0x3060 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x0000000024ee2800 nid=0x2198 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Attach Listener" daemon prio=10 tid=0x0000000024ee2000 nid=0x1ae4 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x0000000024ee1000 nid=0x135c waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=8 tid=0x0000000024ed9000 nid=0x3480 in Object.wait() [0x00000000278ff000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000004c000d4b0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
- locked <0x00000004c000d4b0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
"Reference Handler" daemon prio=10 tid=0x0000000024ed8000 nid=0x1ae8 in Object.wait() [0x00000000277ff000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000004c000d300> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
- locked <0x00000004c000d300> (a java.lang.ref.Reference$Lock)
"main" prio=6 tid=0x00000000023c2800 nid=0x2e7c waiting on condition [0x00000000023bf000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.neo4j.io.fs.FileUtils.waitAndThenTriggerGC(FileUtils.java:253)
at org.neo4j.io.fs.FileUtils.deleteFile(FileUtils.java:110)
at org.neo4j.io.fs.DefaultFileSystemAbstraction.deleteFile(DefaultFileSystemAbstraction.java:127)
at org.neo4j.kernel.impl.storemigration.FileOperation$3.perform(FileOperation.java:93)
at org.neo4j.kernel.impl.storemigration.StoreFile.fileOperation(StoreFile.java:267)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:389)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:279)
"VM Thread" prio=10 tid=0x0000000024ed1800 nid=0x3058 runnable
"GC task thread#0 (ParallelGC)" prio=6 tid=0x00000000023d7000 nid=0x313c runnable
"GC task thread#1 (ParallelGC)" prio=6 tid=0x00000000023d9000 nid=0x3144 runnable
"GC task thread#2 (ParallelGC)" prio=6 tid=0x00000000023da800 nid=0x974 runnable
"GC task thread#3 (ParallelGC)" prio=6 tid=0x00000000023dc000 nid=0x3a3c runnable
"GC task thread#4 (ParallelGC)" prio=6 tid=0x00000000023de800 nid=0x3684 runnable
"GC task thread#5 (ParallelGC)" prio=6 tid=0x00000000023e1000 nid=0x35b8 runnable
"GC task thread#6 (ParallelGC)" prio=6 tid=0x00000000023e4000 nid=0x3950 runnable
"GC task thread#7 (ParallelGC)" prio=6 tid=0x00000000023e5800 nid=0x318c runnable
"GC task thread#8 (ParallelGC)" prio=6 tid=0x00000000023e8800 nid=0x30b8 runnable
"GC task thread#9 (ParallelGC)" prio=6 tid=0x00000000023e9800 nid=0x32dc runnable
"VM Periodic Task Thread" prio=10 tid=0x0000000024eed800 nid=0x3710 waiting on condition
JNI global references: 377
Heap
PSYoungGen total 2071552K, used 0K [0x0000000780000000, 0x0000000800000000, 0x0000000800000000)
eden space 2043904K, 0% used [0x0000000780000000,0x0000000780000000,0x00000007fcc00000)
from space 27648K, 0% used [0x00000007fe500000,0x00000007fe500000,0x0000000800000000)
to space 25600K, 0% used [0x00000007fcc00000,0x00000007fcc00000,0x00000007fe500000)
ParOldGen total 11534336K, used 10982258K [0x00000004c0000000, 0x0000000780000000, 0x0000000780000000)
object space 11534336K, 95% used [0x00000004c0000000,0x000000075e4dcb50,0x0000000780000000)
PSPermGen total 21504K, used 13521K [0x00000004bae00000, 0x00000004bc300000, 0x00000004c0000000)
object space 21504K, 62% used [0x00000004bae00000,0x00000004bbb34588,0x00000004bc300000)
2016-02-17 08:28:20
在這么小的數據集上,這很奇怪。 您希望此數據集中有多少個獨特的關系和標簽? 還可以在發生暫停時以某種方式提供線程轉儲嗎?
編輯:問題是包含屬性值的列用作LABEL
。 這會錯誤地產生大量標簽,並且計數不會隨之增加。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.