Neo4j中使用cypher的加速关系和节点创建

Question

i have 2 csv files A and B. File A contains 7000 rows with 6 properties and File B contains 10M rows with 11 properties. 我有2个csv文件A和B。文件A包含具有6个属性的7000行，文件B包含具有11个属性的10M行。 Moreover, File A has the property PKA which is used as primary key, whereas File B has the property FKA which is used as foreign key respect to PKA. 此外，文件A具有用作主键的属性PKA，而文件B具有用作PKA的外键的属性FKA。

I want to load these files into Neo4j in this way: 1 - insert a new node for each row of File A and File B 2 - add a relationship between any node created that represents the relationship primary and foreign key described. 我想以这种方式将这些文件加载到Neo4j中：1-为文件A和文件B的每一行插入一个新节点2-在创建的代表所描述的主键和外键的任何节点之间添加关系。

Currently, I have inserted these files with BatchInserter using the JAVA API adding a node for each row of these files and setting the labels "A" and "B" for File A and file B respectively. 当前，我已经使用JAVA API通过BatchInserter插入了这些文件，为这些文件的每一行添加了一个节点，并分别为文件A和文件B设置了标签“ A”和“ B”。 I have also create two index for PKA and FKA. 我还为PKA和FKA创建了两个索引。 To add the relationships my intention is to call the following cypher statement (from Neo4jShell): 要添加关系，我的意图是调用以下cypher语句（来自Neo4jShell）：

match (a:A), (b:B) where a.PKA=b.FKB create (a)<-[:KEYREL]-(b);

My problems are: - adding the nodes with BatchInserter takes 14minutes for File B (the biggest one) with only one commit at the end (~12k nodes/sec, ~130k properties/sec), I want to speedup the import process of a factor of 2. - the cypher query can't be handled with this dataset size but i would like to make is possible. 我的问题是：-使用BatchInserter添加文件B的节点需要14分钟（最大的一个），最后只提交一次（〜12k节点/秒，〜130k属性/秒），我想加快导入的速度。因子2。-密码查询无法使用此数据集大小进行处理，但我想这样做是可能的。

Im running on a VM with an IntelXeon @2.6Ghz dual core and 8GB RAM with Windows 64bit and Java8 64 bit installed. 我在具有IntelXeon @ 2.6Ghz双核和8GB RAM（已安装Windows 64位和Java8 64位）的VM上运行。 I have run my import java program and Neo4jShell with the following java options: 我已经使用以下Java选项运行了导入Java程序和Neo4jShell：

-server -XX:+UseConcMarkSweepGC -Xms2000m -Xmx5000m

Answer 1

Running MATCH is typically quite slow when employed on a large volume of data. 当对大量数据使用时，运行MATCH通常非常慢。

You could try to speed it up creating a constraint on the nodes, wherein you define each node as unique. 您可以尝试加快速度，以在节点上创建约束，其中将每个节点定义为唯一。 This can speed up the MATCH operation, though it does also take time to create the constraint: 这可以加快MATCH操作的速度，尽管创建约束也需要时间：

CREATE CONSTRAINT ON (a:A) ASSERT a.PKA IS UNIQUE;
CREATE INDEX ON :B(PKB);

You can then run the MATCH, which you can run from a third CSV file per the Neo4j docs which describe a similar scenario to yours. 然后，您可以运行MATCH，可以根据Neo4j文档从第三个CSV文件运行该MATCH，该文档描述了与您的场景类似的场景。

Neo4j中使用cypher的加速关系和节点创建

问题描述

1 个解决方案

解决方案1
1 2014-06-12 10:36:10

Neo4j中使用cypher的加速关系和节点创建

问题描述

1 个解决方案

解决方案1 1 2014-06-12 10:36:10

解决方案1
1 2014-06-12 10:36:10