導入大型CSV文件時出現Neo4j / Py2Neo超時問題

Question

將大型CSV文件（> 200MB）中的數據導入Neo4j時，響應最終會掛起。 查詢確實完成 ，並且導入了所有記錄，但是似乎存在某種響應超時，這導致沒有指示導入查詢已完成。 這是一個問題，因為我們無法自動將多個文件導入Neo4j，因為腳本會繼續等待查詢完成，即使它已經存在。

導入1個文件大約需要10-15分鍾。

管道中的任何地方都不會拋出任何錯誤，一切都會掛起。 我只能告訴進程何時完成，因為VM CPU活動已經停止。

此過程適用於較小的文件 ，並在上一個文件導入完成后發送回確認，並移至下一個文件。

我已經嘗試直接在控制台上運行Jupyter筆記本中的腳本以及python腳本。 我甚至嘗試通過瀏覽器控制台直接在Neo4j上運行查詢。 每種方式都會導致掛起查詢，因此我不確定問題是來自Neo4j還是Py2Neo。

示例查詢：

USING PERIODIC COMMIT 1000
LOAD CSV FROM {csvfile}  AS line
MERGE (:Author { authorid: line[0], name: line[1] } )

使用Py2Neo修改了python腳本：

from azure.storage.blob import BlockBlobService
blob_service = BlockBlobService(account_name="<name>",account_key="<key>")
generator = blob_service.list_blobs("parsed-csv-files")

for blob in generator:
    print(blob.name)
    csv_file_base = "http://<base_uri>/parsed-csv-files/"
    csvfile = csv_file_base + blob.name
    params = { "csvfile":csvfile }
    mygraph.run(query, parameters=params )

Neo4j debug.log似乎沒有記錄任何錯誤。

示例debug.log：

2019-05-30 05:44:32.022+0000 INFO [o.n.k.i.i.s.GenericNativeIndexProvider] Schema index cleanup job finished: descriptor=IndexRule[id=16, descriptor=Index( UNIQUE, :label[5](property[5]) ), provider={key=native-btree, version=1.0}, owner=42], indexFile=/data/databases/graph.db/schema/index/native-btree-1.0/16/index-16 Number of pages visited: 598507, Number of cleaned crashed pointers: 0, Time spent: 2m 25s 235ms
2019-05-30 05:44:32.071+0000 INFO [o.n.k.i.i.s.GenericNativeIndexProvider] Schema index cleanup job closed: descriptor=IndexRule[id=16, descriptor=Index( UNIQUE, :label[5](property[5]) ), provider={key=native-btree, version=1.0}, owner=42], indexFile=/data/databases/graph.db/schema/index/native-btree-1.0/16/index-16
2019-05-30 05:44:32.071+0000 INFO [o.n.k.i.i.s.GenericNativeIndexProvider] Schema index cleanup job started: descriptor=IndexRule[id=19, descriptor=Index( UNIQUE, :label[6](property[6]) ), provider={key=native-btree, version=1.0}, owner=46], indexFile=/data/databases/graph.db/schema/index/native-btree-1.0/19/index-19
2019-05-30 05:44:57.126+0000 INFO [o.n.k.i.i.s.GenericNativeIndexProvider] Schema index cleanup job finished: descriptor=IndexRule[id=19, descriptor=Index( UNIQUE, :label[6](property[6]) ), provider={key=native-btree, version=1.0}, owner=46], indexFile=/data/databases/graph.db/schema/index/native-btree-1.0/19/index-19 Number of pages visited: 96042, Number of cleaned crashed pointers: 0, Time spent: 25s 55ms
2019-05-30 05:44:57.127+0000 INFO [o.n.k.i.i.s.GenericNativeIndexProvider] Schema index cleanup job closed: descriptor=IndexRule[id=19, descriptor=Index( UNIQUE, :label[6](property[6]) ), provider={key=native-btree, version=1.0}, owner=46], indexFile=/data/databases/graph.db/schema/index/native-btree-1.0/19/index-19

編輯：使用更簡單的查詢仍然提出相同的問題

Answer 1

由於查詢將花費大量時間在數據庫端完成，因此py2neo可能存在等待問題。

定期提交不應該有任何問題。

您是否嘗試過Python neo4j驅動程序並從python中讀取csv並以這種方式執行查詢？

這是neo4j驅動程序的示例代碼。

import pandas as pd
from neo4j import GraphDatabase

driver = GraphDatabase.driver(serveruri, auth=(user,pwd))
with driver.session() as session:
    file = config['spins_file']
    row_chunks = pd.read_csv(file, sep=',', error_bad_lines=False,
                       index_col=False,
                       low_memory=False,
                       chunksize=config['chunk_size'])
    for i, rows in enumerate(row_chunks):
        print("Chunk {}".format(i))
        rows_dict = {'rows': rows.fillna(value="").to_dict('records')}
        session.run(statement="""
                    unwind data.rows as row
                    MERGE (:Author { authorid: line[0], name: line[1] } )
                    """,
                    dict=rows_dict)

導入大型CSV文件時出現Neo4j / Py2Neo超時問題

問題描述

1 個解決方案

解決方案1
0 2019-06-02 00:24:14

導入大型CSV文件時出現Neo4j / Py2Neo超時問題

問題描述

1 個解決方案

解決方案1 0 2019-06-02 00:24:14

解決方案1
0 2019-06-02 00:24:14