简体   繁体   中英

Eager operator warning for importing CSV file into neo4j

I want to import about 40000 nodes of the twitter dataset from CSV file into neo4j with LOAD CSV command.

The CSV file organization is like this:

id,screenName,tags,avatar,followersCount,friendsCount,lang,lastSeen,tweetId,friends
"1969527638","LlngoMakeEmCum_",[ "#nationaldogday" ],"http://pbs.twimg.com/profile_images/534286217882652672/FNmiQYVO_normal.jpeg",319,112,"en",1472271687519,"769310701580083200",[ "1969574754", "1969295556", "1969284056", "1969612214"]

I'm running this code in neo4j:

LOAD CSV WITH HEADERS FROM "file:/data.csv" AS row 
WITH row, split(row.friends, ",") AS friends 
UNWIND friends AS friend 
MERGE (p1:Person {id:row.id}) 
MERGE (p2:Person {id:friend}) 
MERGE (p1)-[:FRIEND_WITH]->(p2)

And I got this warning: The execution plan for this query contains the Eager operator, which forces all dependent data to be materialized in main memory before proceeding

Using LOAD CSV with a large data set in a query where the execution plan contains the Eager operator could potentially consume a lot of memory and is likely to not perform well. See the Neo4j Manual entry on the Eager operator for more information and hints on how problems could be avoided.

What's the meaning of this warning? And how can I import this dataset?

I've found "using periodic commit" quite useful to lower the execution plan weight. During one seminar I've also heard that a heavy query in neo4j may even kill your database, so that the error you pasted is just a warning and telling you should consider your command.

Here is an example from neo4j documentation that may be useful in your case:

USING PERIODIC COMMIT 500
LOAD CSV FROM 'https://neo4j.com/docs/cypher-manual/3.5/csv/artists.csv' AS line
CREATE (:Artist { name: line[1], year: toInteger(line[2])})

The eager operator ensures that operations within your query do not conflict with each other. When importing data via LOAD CSV, the eager operator will provide the boundary between the reads and writes, ensuring that an operation is performed on all rows before moving to the next operation, to avoid conflicts. This usually implies that your overall import is less efficient.

For small files, such as yours, depending on the config of your machine, your query should be fine. Otherwise, break it up into multiple runs:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/data.csv" AS row 
WITH row, split(row.friends, ",") AS friends 
UNWIND friends AS friend 
MERGE (p1:Person {id:row.id}) 
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/data.csv" AS row 
WITH row, split(row.friends, ",") AS friends 
UNWIND friends AS friend 
MERGE (p2:Person {id:row.id}) 
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/data.csv" AS row 
WITH row, split(row.friends, ",") AS friends 
UNWIND friends AS friend 
MATCH (p1:Person {id:row.id}) 
MATCH (p2:Person {id:friend}) 
MERGE (p1)-[:FRIEND_WITH]->(p2)

Also, commit more frequently using PERIODIC COMMIT .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM