The query is for loading the 1 million ratings from Grouplens dataset. I have already created nodes for users and movies, and now am merging them in relationships with movies.
load csv from "file:///ratings.csv" as row fieldterminator ';'
MERGE (u:User {userID:toInt(row[0])} )
MERGE (m:Movie {movieID:toInt(row[1])} )
MERGE (u)-[r:RATING {value:toInt(row[3])} ]->(m)
This query takes a very long time when allocated 2GB RAM in the JVM (laptop, 4GB RAM), although runs reasonably fast with 4-6 GB RAM (desktop). Also, I have indexes on Users and Movies with their respective IDs.
The profile of this query looks like this-
The amount of db hits look perverse, and I think I can optimize this query.
(Follow up question): How could I run that optimized cypher query in neo4j-shell? Is this the correct syntax -
start [CYPHER_QUERY] ;
Try USING PERIODIC COMMIT
. http://neo4j.com/docs/stable/query-periodic-commit.html
Also, consider using CREATE
instead of MERGE
for the last line to create the relationship, as I'm assuming ratings aren't repeated in your .csv file.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.