Neo4j performance tuning

Question

I'm new to Neo4j, Currently I'm trying to make dating site as POC. I have 4GB of Input file which is look like bellow format.

This contains viewerId(male/female), viewedId which is list of id's they have viewed. Based on this history file, I need to give recommendation when any user comes to online.

Input file:

viewerId   viewedId 
12345   123456,23456,987653 
23456   23456,123456,234567 
34567   234567,765678,987653 
:

For this task, I tried the following way,

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
UNWIND viewedIds AS viewedId
MERGE (p2:Persons2 {viewerId: row.viewerId})
MERGE (c2:Companies2 {viewedId: viewedId})
MERGE (p2)-[:Friends]->(c2)
MERGE (c2)-[:Sees]->(p2);

And My Cypher query to get result is,

MATCH (p2:Persons2)-[r*1..3]->(c2: Companies2)
RETURN p2,r, COLLECT(DISTINCT c2) as friends

To complete this task, it will take 3 days.

My system config:

Ubuntu -14.04  
RAM -24GB

Neo4j Config:
neo4j.properties:

neostore.nodestore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=2300M
neostore.propertystore.db.arrays.mapped_memory=5M
neostore.propertystore.db.strings.mapped_memory=3200M
neostore.relationshipstore.db.mapped_memory=800M

neo4j-wrapper.conf

wrapper.java.initmemory=12000
wrapper.java.maxmemory=12000

To reduce time, I search and get one idea in internet like Batch importer from the following link, https://github.com/jexp/batch-import

In that link, they have node.csv, rels.csv files, they imported into Neo4j. I'm not getting any idea about how they are creating node.csv and rels.csv files which scripts they're are using and all.

Can anyone give me sample script to make node.csv and rels.csv files for my data?

Or can you give any suggestions to make import and retrieve data faster?

Thanks in Advance.

Answer 1

You don't need the inverse relationship, only one is good enough !

For the Import configure your heap (neo4j-wrapper.conf) to 12G, configure page-cache (neo4j.properties) to 10G.

Try this, it should be done in a few minutes.

create constraint on (p:Persons2) assert p.viewerId is unique;
create constraint on (p:Companies2) assert p.viewedId is unique;

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
MERGE (p2:Persons2 {viewerId: row.viewerId});

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
FOREACH (viewedId IN split(row.viewedId, ",") |
  MERGE (c2:Companies2 {viewedId: viewedId}));

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
MATCH (p2:Persons2 {viewerId: row.viewerId})
UNWIND viewedIds AS viewedId
MATCH (c2:Companies2 {viewedId: viewedId})
MERGE (p2)-[:Friends]->(c2);

For the relationship-merge if you have some companies which have hundreds of thousands up to millions of views, you might want to use this instead:

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
MATCH (p2:Persons2 {viewerId: row.viewerId})
UNWIND viewedIds AS viewedId
MATCH (c2:Companies2 {viewedId: viewedId})
WHERE shortestPath((p2)-[:Friends]->(c2)) IS NULL
CREATE (p2)-[:Friends]->(c2);

Regarding your query?

What do you want to achieve by retrieving the cross products between all people and all companies up to 3 levels deep? These might be trillions of paths?

Usually you want to know this for a single person or company.

Update Your Query

Eg. For 123456, Persons who are all viewed this company is 12345,23456, then what are the companies these persons viewed 12345 123456,23456,987653 23456 23456,123456,234567 then I need to give recommendation to company -123456 as 23456,987653,23456,234567 Distinct of Result(Final Result) 23456,987653,234567

match (c:Companies2)<-[:Friends]-(p1:Persons2)-[:Friends]->(c2:Companies2)
where c.viewedId = 123456
return distinct c2.viewedId;

for all companies, this might help:

match (c:Companies2)<-[:Friends]-(p1:Persons2)
with p1, collect(c) as companies
match (p1)-[:Friends]->(c2:Companies2)
return c2.viewedId, extract(c in companies | c.viewedId);

Neo4j performance tuning

Question

1 answers

solution1
1 ACCPTED 2015-06-23 11:59:34

Regarding your query?

Update Your Query

Neo4j performance tuning

Question

1 answers

solution1 1 ACCPTED 2015-06-23 11:59:34

Regarding your query?

Update Your Query

solution1
1 ACCPTED 2015-06-23 11:59:34