繁体   English   中英

将巨大的CSV文件导入Neo4j

[英]Import huge csv file into neo4j

我知道导入工具,但就我而言,我必须阅读一行并将其分解为节点和关系。 将load csv查询与具有索引和的定期提交配合使用,以导入200万行,耗时超过12个小时。 我有办法使用上述工具而不必将csv预处理成节点和关系吗?

以下是我使用的示例查询

CREATE INDEX ON :Patient(mrno);
CREATE INDEX ON :Location(city);
CREATE INDEX ON :Department(id);

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///home/geralt/Desktop/Temp_Admission.csv" AS line
WITH line,
(CASE  WHEN line.MRNo='' OR line.MRNo='null'  THEN "BLEH" ELSE line.MRNo END, "NA") AS mrn,
(CASE  WHEN line.ID_Admit='' OR line.ID_Admit='NULL'  THEN -1 ELSE line.ID_Admit END,0) AS ID_Admit,
(CASE  WHEN line.DeptCode_Admit='' OR line.DeptCode_Admit='NULL'  THEN -1 ELSE line.DeptCode_Admit END,0) AS DeptCode_Admit,
(CASE  WHEN line.City='' OR line.City='NULL'  THEN "BLEH" ELSE line.City END,"NA") AS city

MERGE (p:Person { mrn: mrn}) ON MATCH SET p.DOB=line.DateOfBirth,p.gender=line.GenderDescription,p.prefix=line.PrefixDescription ON CREATE SET p.DOB=line.DateOfBirth,p.gender=line.GenderDescription,p.prefix=line.PrefixDescription
CREATE (a:Admission{HospitalName:line.Hospital,id:toInt(ID_Admit),unitId:line.UnitID_Admit,IPDNo:line.IPDNO,DateOfAdmission:line.Date_Admit})
MERGE(d:Department{id:toInt(DeptCode_Admit)}) ON MATCH SET d.name=line.DeptName_Admit
MERGE(l:Location{city:city}) ON MATCH SET l.country=line.Country,l.state=line.State


merge  p-[:Admitted]->a 
MERGE a-[:Located]->l

只需多次运行就应该非常简单(您甚至可以与多个浏览器或neo4j-shell会话并行运行 )。

  1. 取下ON MATCH SET
  2. 你错了mrn o
  3. 您缺少:Person(mrno), :Admission(id)索引
  4. 您的案例陈述不正确
  5. 您使用的匹配时,当你的意思ON CREATE SET
  6. 您可以通过仅对要导入的字段在WITH上运行非重复项来进一步优化导入,请参见Department

这是您的固定/完整/多次运行导入脚本:

CREATE INDEX ON :Patient(mrno);


CREATE INDEX ON :Location(city);
CREATE INDEX ON :Department(id);

// additional indexes / constraints

CREATE INDEX ON :Person(mrno);

CREATE CONSTRAINT ON (a:Admission) assert a.id is unique;

USING PERIODIC COMMIT 100000
explain
LOAD CSV WITH HEADERS FROM "file:///home/geralt/Desktop/Temp_Admission.csv" AS line
WITH line,
CASE  WHEN line.MRNo='' OR line.MRNo='null'  THEN "NA" ELSE line.MRNo END AS mrno

MERGE (p:Person { mrno: mrno}) 
  ON CREATE SET p.DOB=line.DateOfBirth,p.gender=line.GenderDescription,p.prefix=line.PrefixDescription;


USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:///home/geralt/Desktop/Temp_Admission.csv" AS line
WITH line,
CASE  WHEN line.ID_Admit='' OR line.ID_Admit='NULL'  THEN -1 ELSE toInt(line.ID_Admit) END AS ID_Admit

CREATE (a:Admission{HospitalName:line.Hospital,id:ID_Admit,unitId:line.UnitID_Admit,IPDNo:line.IPDNO,DateOfAdmission:line.Date_Admit});

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:///home/geralt/Desktop/Temp_Admission.csv" AS line
WITH distinct line.DeptName_Admit AS DeptName_Admit,
CASE  WHEN line.DeptCode_Admit='' OR line.DeptCode_Admit='NULL'  THEN -1 ELSE toInt(line.DeptCode_Admit) END AS DeptCode_Admit

MERGE (d:Department{id:DeptCode_Admit}) 
  ON CREATE SET d.name=DeptName_Admit;


USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:///home/geralt/Desktop/Temp_Admission.csv" AS line
WITH line,
CASE  WHEN line.City='' OR line.City='NULL'  THEN "NA" ELSE line.City END AS city

MERGE(l:Location{city:city}) 
  ON CREATE SET l.country=line.Country,l.state=line.State;


USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:///home/geralt/Desktop/Temp_Admission.csv" AS line
WITH
CASE  WHEN line.MRNo='' OR line.MRNo='null'  THEN "NA" ELSE line.MRNo END AS mrno,
CASE  WHEN line.ID_Admit='' OR line.ID_Admit='NULL'  THEN -1 ELSE toInt(line.ID_Admit) END AS ID_Admit

MATCH (p:Person { mrno: mrno}) 
MATCH (a:Admission {id:ID_Admit})
MERGE (p)-[:Admitted]->(a);

USING PERIODIC COMMIT 10000
explain
LOAD CSV WITH HEADERS FROM "file:///home/geralt/Desktop/Temp_Admission.csv" AS line
WITH
CASE  WHEN line.ID_Admit='' OR line.ID_Admit='NULL'  THEN -1 ELSE toInt(line.ID_Admit) END AS ID_Admit,
CASE  WHEN line.City='' OR line.City='NULL'  THEN "NA" ELSE line.City END AS city

MATCH (a:Admission {id:ID_Admit})
MATCH (l:Location{city:city}) 
MERGE (a)-[:Located]->(l);

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM