[英]Importing CSV relations to Neo4j
I'm trying to import data from a MySQL database to Neo4j, using CSV files as an intermediary. 我正在尝试使用CSV文件作为中介,将数据从MySQL数据库导入Neo4j。 I'm following the basic example , but can't quite get it to work.
我正在遵循基本示例 ,但不能完全正常工作。 I'm importing two tables with these queries:
我正在使用这些查询导入两个表:
//Import projects.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/projects.csv" AS row
CREATE (:project
{
project_id: row.fan,
project_name: row.project_name
});
//Import people.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/persons.csv" AS row
CREATE (:person
{
person_id: row.person_id,
person_name: row.person_name,
});
//Create indicies.
CREATE INDEX ON :project(project_id);
CREATE INDEX ON :project(project_name);
CREATE INDEX ON :person(person_id);
CREATE INDEX ON :person(person_name);
This part works. 这部分有效。 What doesn't work is when I try to import the relations:
什么是无效的,当我尝试导入关系时:
//Create project-person relationships.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/project_persons.csv" AS row
MATCH (project:project {project_id: row.project_id})
MATCH (person:person {person_id: row.person_id})
MERGE (person)-[:CONTRIBUTED]->(project);
The console accepts the query without an error, but never finishes. 控制台接受查询,没有错误,但是永远不会完成。 It's been running for days at 100% CPU, 25% RAM, but negligible disk usage.
它在100%CPU,25%RAM,但磁盘使用率可以忽略的情况下运行了几天。 No relations appear in the database information.
数据库信息中没有任何关系。
Did I make a mistake somewhere, or is it really this slow? 我是在某个地方犯了错误,还是真的这么慢? The
project_persons.csv
file is 13 million lines long, but shouldn't the periodic commit make something show up by now? project_persons.csv
文件的长度为1300万行,但是定期提交现在是否应该显示出来?
shouldn't the periodic commit make something show up by now?
Only for the LOAD - do an "explain" at the front of the CREATE and it'll tell you how it's structuring the update and the # of records it expects to process. 仅对于LOAD-在CREATE的前面做一个“解释”,它会告诉您它如何构造更新以及它希望处理的记录数。 I ran into the same issue - Neo4j was doing the entire update as a single transaction and never completed.
我遇到了同样的问题-Neo4j将整个更新作为一个事务进行,但从未完成。 The transaction needed to be broken up into 50K - 100K tx chunks to get everything done.
事务需要分解成5万至10万个TX块才能完成所有工作。
One way to do thisis to import the relation information as a set of labelled nodes, and then use those nodes to MATCH() the person and project nodes and create the relation as required. 一种实现方法是将关系信息作为一组标记节点导入,然后使用这些节点来匹配人员和项目节点,并根据需要创建关系。
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/project_persons.csv" AS row
CREATE (:Relations {project_id: row.project_id, person_id: row.person_id})
then process the records in 50K batches: 然后以50K批处理记录:
MATCH (r:Relations)
MATCH (prj:project {project_id: r.project_id})
MATCH (per:person {person_id: r.person_id})
WITH r, prj, per LIMIT 50000
MERGE (per)-[:CONTRIBUTED]->(prj)
DELETE r
Run this multiple times until all the relations are created and you're good to go. 多次运行此操作,直到创建所有关系,然后您就可以开始了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.