簡體   English   中英

neo4j-admin 導入很慢

[英]neo4j-admin import very slow

我正在嘗試使用 yelp 挑戰數據集使用 neo4j,我感興趣的方面之一是批量導入。 不幸的是,導入需要很長時間,然后它應該並且最后我得到一個內存錯誤。 節點的導入大多順利,然后在關系的 65% 到 70% 處開始變慢,然后以上述錯誤完成。 我在 conf 文件中設置了以下內容:dbms.memory.heap.initial_size=5g,dbms.memory.heap.max_size=10g,dbms.memory.pagecache.size=10g。

sudo neo4j-admin import --mode=csv --nodes:Business "node_business_headers.csv,node_business.csv" \
--nodes:Categories "node_category_headers.csv,node_category.csv" \
--nodes:User "node_user_headers.csv,node_user.csv" \
--nodes:Review "node_review_headers.csv,node_review.csv" \
--relationships:IS_FRIEND_WITH "edge_friends_headers.csv,edge_friends.csv" \
--relationships:WROTE "edge_wrote_review_headers.csv,edge_wrote_review.csv" \
--relationships:ABOUT "edge_about_business_headers.csv,edge_about_business.csv" \
--relationships:BELONG_TO "edge_belongto_category_headers.csv,edge_belongto_category.csv" \
--ignore-missing-nodes --database=mygraph.db
Neo4j version: 3.4.5
Importing the contents of these files into /var/lib/neo4j/data/databases/mygraph.db:
Nodes:
:Business
/home/user/graph_data/yelp_challenge/data/node_business_headers.csv
/home/user/graph_data/yelp_challenge/data/node_business.csv

:Categories
/home/user/graph_data/yelp_challenge/data/node_category_headers.csv
/home/user/graph_data/yelp_challenge/data/node_category.csv

:User
/home/user/graph_data/yelp_challenge/data/node_user_headers.csv
/home/user/graph_data/yelp_challenge/data/node_user.csv

:Review
/home/user/graph_data/yelp_challenge/data/node_review_headers.csv
/home/user/graph_data/yelp_challenge/data/node_review.csv
Relationships:
:IS_FRIEND_WITH
/home/user/graph_data/yelp_challenge/data/edge_friends_headers.csv
/home/user/graph_data/yelp_challenge/data/edge_friends.csv

:WROTE
/home/user/graph_data/yelp_challenge/data/edge_wrote_review_headers.csv
/home/user/graph_data/yelp_challenge/data/edge_wrote_review.csv

:ABOUT
/home/user/graph_data/yelp_challenge/data/edge_about_business_headers.csv
/home/user/graph_data/yelp_challenge/data/edge_about_business.csv

:BELONG_TO
/home/user/graph_data/yelp_challenge/data/edge_belongto_category_headers.csv
/home/user/graph_data/yelp_challenge/data/edge_belongto_category.csv

Available resources:
Total machine memory: 31.26 GB
Free machine memory: 24.63 GB
Max heap memory : 6.95 GB
Processors: 16
Configured max memory: 21.88 GB
High-IO: false

Import starting 2018-08-16 23:09:15.820+0100
Estimated number of nodes: 6.76 M
Estimated number of node properties: 36.60 M
Estimated number of relationships: 60.82 M
Estimated number of relationship properties: 0.00 
Estimated disk space usage: 2.75 GB
Estimated required memory usage: 1.08 GB

InteractiveReporterInteractions command list (end with ENTER):
c: Print more detailed information about current stage
i: Print more detailed information

(1/4) Node import 2018-08-16 23:09:15.833+0100
Estimated number of nodes: 6.76 M
Estimated disk space usage: 848.51 MB
Estimated required memory usage: 1.08 GB
.......... .......... .......... .......... .......... 5%
.......... .......... .......... .......... .......... 10%
.......... .......... .......... .......... .......... 15%
.......... .......... .......... .......... .......... 20%
.......... .......... .......... .......... .......... 25%
.......... .......... .......... .......... .......... 30%
.......... .......... .......... .......... .......... 35%
.......... .......... .......... .......... .......... 40%
.......... .......... .......... .......... .......... 45%
.......... .......... .......... .......... .......... 50%
.......... .......... .......... .......... .......... 55%
.......... .......... .......... .......... .......... 60%
.......... .......... .......... .......... .......... 65%
.......... .......... .......... .......... .......... 70%
.......... .......... .......... .......... .......... 75%
.......... .......... .......... .......... .......... 80%
.......... .......... .......... .......... .......... 85%
.......... .......... .......... .......... .......... 90%
.......... .......... .......... .......... .......... 95%
.......... .......... .......... .......... .......... 100%

(2/4) Relationship import 2018-08-16 23:09:22.174+0100
Estimated number of relationships: 60.82 M
Estimated disk space usage: 1.93 GB
Estimated required memory usage: 1.07 GB
.......... .......... .......... .......... .......... 5%
.......... .......... .......... .......... .......... 10%
.......... .......... .......... .......... .......... 15%
.......... .......... .......... .......... .......... 20%
.......... .......... .......... .......... .......... 25%
.......... .......... .......... .......... .......... 30%
.......... .......... .......... .......... .......... 35%
.......... .......... .......... .......... .......... 40%
.......... .......... .......... .......... .......... 45%
.......... .......... .......... .......... .......... 50%
.......... .......... .......... .......... .......... 55%
.......... .......... .......... .......... .......... 60%
.......... .......... .......... .......... .......... 65%
.......... .......... .......... .......... .......... 70%
.......... .......... .......... .......... .......... 75%
.......... .......... .......... .......... .......... 80%
.......... .......... .......... .......... .......... 85%
.......... .......... .......... .......... .......... 90%
.......... .......... .......... .......... .......... 95%
.......... .......... .......... .......... .......... 100%


IMPORT DONE in 25m 43s 310ms. 
Data statistics is not available.
Peak memory usage: 1.07 GB
There were bad entries which were skipped and logged into /home/user/graph_data/yelp_challenge/data/import.report
WARNING Import failed. The store files in /var/lib/neo4j/data/databases/mygraph.db are left as they are, although they are likely in an unusable state. Starting a database on these store files will likely fail or observe inconsistent records so start at your own risk or delete the store manually
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.neo4j.csv.reader.Extractors$StringExtractor.extract0(Extractors.java:427)
at org.neo4j.csv.reader.Extractors$AbstractSingleValueExtractor.extract(Extractors.java:360)
at org.neo4j.csv.reader.BufferedCharSeeker.tryExtract(BufferedCharSeeker.java:305)
at org.neo4j.csv.reader.BufferedCharSeeker.tryExtract(BufferedCharSeeker.java:311)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputParser.next(CsvInputParser.java:112)
at org.neo4j.unsafe.impl.batchimport.input.csv.LazyCsvInputChunk.next(LazyCsvInputChunk.java:96)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputChunkProxy.next(CsvInputChunkProxy.java:75)
at org.neo4j.unsafe.impl.batchimport.ExhaustingEntityImporterRunnable.run(ExhaustingEntityImporterRunnable.java:57)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

請嘗試以下操作:

  1. 檢查是否正在創建import.report文件以及它是否很大
  2. 在調用導入之前嘗試將HEAP_SIZE變量設置為 10g
  3. 我從文檔中看到,最好將 neo4j.conf 中的初始堆和最大堆保持為相同的值,以避免不必要的垃圾收集。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM