简体   繁体   中英

How can I optimize this neo4j query?

The query is for loading the 1 million ratings from Grouplens dataset. I have already created nodes for users and movies, and now am merging them in relationships with movies.

load csv from "file:///ratings.csv" as row fieldterminator ';' 
MERGE (u:User {userID:toInt(row[0])} ) 
MERGE (m:Movie {movieID:toInt(row[1])} ) 
MERGE (u)-[r:RATING {value:toInt(row[3])} ]->(m)

This query takes a very long time when allocated 2GB RAM in the JVM (laptop, 4GB RAM), although runs reasonably fast with 4-6 GB RAM (desktop). Also, I have indexes on Users and Movies with their respective IDs.

The profile of this query looks like this-

在此处输入图片说明

The amount of db hits look perverse, and I think I can optimize this query.

(Follow up question): How could I run that optimized cypher query in neo4j-shell? Is this the correct syntax -

start [CYPHER_QUERY] ;

Try USING PERIODIC COMMIT . http://neo4j.com/docs/stable/query-periodic-commit.html

Also, consider using CREATE instead of MERGE for the last line to create the relationship, as I'm assuming ratings aren't repeated in your .csv file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM