neo4j performance for Merge queries on 100 thousand nodes

Question

I have started working with neo4j recently and I have performance problem with Merge query for creating my graph.

I have a csv file with 100,000 records and want to load the data from this file. My query for loading is as follows:

//Script to import global Actors data
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///D:/MOT/test_data.csv" AS row
MERGE (c:Country {Name:row.Country})

MERGE (a:Actor {Name: row.ActorName, Aliases: row.Aliases, Type:row.ActorType})

My system configuration: 8.00 GB RAM and Core i5-3330 CPU.

my neo4j config is as follows:

neostore.nodestore.db.mapped_memory=50M
neostore.relationshipstore.db.mapped_memory=50M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.propertystore.db.arrays.mapped_memory=130M
mapped_memory_page_size=1048576
label_block_size=60
arrat_block_size=120
node_auto_indexing=False
string_block_size=120

when I run this query in neo4j browser it takes more than a day. Would you please help me to solve the problem? please let me know for example if I should change my JVM configuration or change my query or ... and how?

Answer 1

To increase the speed of MERGE queries you should create indexes on your MERGE properties:

CREATE INDEX ON :Country(Name)
CREATE INDEX ON :Actor(Name)

If you have unique node properties, you can increase performance even more by using uniqueness constraints instead of normal indexes:

CREATE CONSTRAINT ON (node:Country) ASSERT node.Name IS UNIQUE
CREATE CONSTRAINT ON (node:Actor) ASSERT node.Name IS UNIQUE

In general your query will be faster if you MERGE on a single, indexed property only:

//Script to import global Actors data
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///D:/MOT/test_data.csv" AS row
MERGE (c:Country {Name:row.Country})
MERGE (a:Actor {Name: row.ActorName})
// if necessary, you can set properties here
ON CREATE SET a.Aliases = row.Aliases, a.Type = row.ActorType

Answer 2

As already answered on the google group.

It should just take a few seconds.

I presume:

you use Neo4j 2.3.2 ? you created indexes / constraints for the things you merge on ? you configured your neo4j instance to run with at least 4G of heap? you are using PERIODIC COMMIT ?

I suggest that you run a profile on your statement to see where the biggest issues show up.

Otherwise it is very recommended to split it up.

eg like this:

CREATE CONSTRAINT ON (c:Country) ASSERT c.Name IS UNIQUE;
CREATE CONSTRAINT ON (o:Organization) ASSERT o.Name IS UNIQUE;
CREATE CONSTRAINT ON (a:Actor) ASSERT a.Name IS UNIQUE;


LOAD CSV WITH HEADERS FROM "file:///E:/datasets/Actors_data_all.csv" AS row
WITH distinct row.Country as Country
MERGE (c:Country {Name:Country});

LOAD CSV WITH HEADERS FROM "file:///E:/datasets/Actors_data_all.csv" AS row
WITH distinct row.AffiliationTo as AffiliationTo
MERGE (o:Organization {Name: AffiliationTo});

LOAD CSV WITH HEADERS FROM "file:///E:/datasets/Actors_data_all.csv" AS row
MERGE (a:Actor {Name: row.ActorName}) ON CREATE SET a.Aliases=row.Aliases, a.Type=row.ActorType;

LOAD CSV WITH HEADERS FROM "file:///E:/datasets/Actors_data_all.csv" AS row
WITH distinct row.Country as Country, row.ActorName as ActorName
MATCH (c:Country {Name:Country})
MATCH (a:Actor {Name:ActorName})
MERGE(c)<-[:IS_FROM]-(a);

LOAD CSV WITH HEADERS FROM "file:///E:/datasets/Actors_data_all.csv" AS row
MATCH (o:Organization {Name: row.AffiliationTo})
MATCH (a:Actor {Name: row.ActorName})
MERGE (a)-[r:AFFILIATED_TO]->(o) 
  ON CREATE SET r.Start=row.AffiliationStartDate, r.End=row.AffiliationEndDate;

neo4j performance for Merge queries on 100 thousand nodes

Question

2 answers

solution1
2 2016-02-29 15:08:25

solution2
0 ACCPTED 2016-02-29 16:30:53

neo4j performance for Merge queries on 100 thousand nodes

Question

2 answers

solution1 2 2016-02-29 15:08:25

solution2 0 ACCPTED 2016-02-29 16:30:53

solution1
2 2016-02-29 15:08:25

solution2
0 ACCPTED 2016-02-29 16:30:53