[英]Loading CSV in Neo4j is time consuming
我想將包含 648000 條記錄的 CDR csv 文件加載到 neo4j(4.4.10),但大約需要 4 天,而且還沒有完成。
我的 CSV 有 7 列的 648000 條記錄。 文件大小約為 48 MB。 我的電腦有 100 GB RAM 和英特爾 Zeon E5 CPU。
CSV 的列是:
OP_名稱 | TP_Name | 被叫號碼 | OP_ANI | 設置時間 | 期間 | OP_Price |
---|
我用來在 Neo4j 中加載 CSV 的代碼是:
```Cypher
:auto load csv with headers from 'file:///cdr.csv' as line FIELDTERMINATOR ','
with line
where line['Called_Number'] is not null and line['OP_ANI'] is not null
with line['OP_ANI'] as OP_Phone,
(CASE line['OP_Name']
WHEN 'TIC' THEN 'IRAN'
ELSE 'Foreign' END) AS OP_country,
line['Called_Number'] as Called_Phone,
(CASE line['TP_Name']
WHEN 'TIC' THEN 'IRAN'
ELSE 'Foreign' END) AS TP_country,
line['Setup_Time'] as Setup_Time,
line['Duration'] as Duration,
line['OP_Price'] as OP_Price
call {
with OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price
MERGE (c:Customer{phone: toInteger(Called_Phone)})
on create set c.country = TP_country
WITH c, OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price
CALL apoc.create.addLabels( c, [ c.country ] ) YIELD node
MERGE (c2:Customer{phone: toInteger(OP_Phone)})
on create set c2.country = OP_country
WITH c2, OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price, c
CALL apoc.create.addLabels( c2, [ c2.country ] ) YIELD node
MERGE (c2)-[r:CALLED{setupTime: Setup_Time,
duration: Duration,
OP_Price: OP_Price}]->(c)
} IN TRANSACTIONS
```
如何加快加載操作?
MERGE
在 Neo4j 中充當 upsert。 所以聲明:
MERGE (c:Customer{phone: toInteger(Called_Phone)})
檢查是否存在具有給定電話號碼的Customer
節點。 如果是,則執行更新,否則創建節點。 當有大量節點時,這種查找可能會很慢,CSV 導入總體上會很慢。 在Customer
的phone
屬性上創建索引應該可以解決問題。 您可以像這樣創建索引:
CREATE INDEX phone IF NOT EXISTS FOR (n:Customer) ON (n.phone)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.