简体   繁体   中英

Importing CSV relations to Neo4j

I'm trying to import data from a MySQL database to Neo4j, using CSV files as an intermediary. I'm following the basic example , but can't quite get it to work. I'm importing two tables with these queries:

//Import projects.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/projects.csv" AS row
CREATE (:project
{
     project_id: row.fan,
     project_name: row.project_name
});

//Import people.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/persons.csv" AS row
CREATE (:person
{
     person_id: row.person_id,
     person_name: row.person_name,
});

//Create indicies.
CREATE INDEX ON :project(project_id);
CREATE INDEX ON :project(project_name);
CREATE INDEX ON :person(person_id);
CREATE INDEX ON :person(person_name);

This part works. What doesn't work is when I try to import the relations:

//Create project-person relationships.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/project_persons.csv" AS row
MATCH (project:project {project_id: row.project_id})
MATCH (person:person {person_id: row.person_id})
MERGE (person)-[:CONTRIBUTED]->(project);

The console accepts the query without an error, but never finishes. It's been running for days at 100% CPU, 25% RAM, but negligible disk usage. No relations appear in the database information.

Did I make a mistake somewhere, or is it really this slow? The project_persons.csv file is 13 million lines long, but shouldn't the periodic commit make something show up by now?

shouldn't the periodic commit make something show up by now?

Only for the LOAD - do an "explain" at the front of the CREATE and it'll tell you how it's structuring the update and the # of records it expects to process. I ran into the same issue - Neo4j was doing the entire update as a single transaction and never completed. The transaction needed to be broken up into 50K - 100K tx chunks to get everything done.

One way to do thisis to import the relation information as a set of labelled nodes, and then use those nodes to MATCH() the person and project nodes and create the relation as required.

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/project_persons.csv" AS row
CREATE (:Relations {project_id: row.project_id, person_id: row.person_id})

then process the records in 50K batches:

MATCH (r:Relations) 
MATCH (prj:project {project_id: r.project_id})
MATCH (per:person {person_id: r.person_id})
WITH r, prj, per LIMIT 50000
MERGE (per)-[:CONTRIBUTED]->(prj)
DELETE r

Run this multiple times until all the relations are created and you're good to go.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM