简体   繁体   English

Neo4j批量数据-创建关系[OutOfMemory异常]

[英]Neo4j Bulk Data - Create Relationship [OutOfMemory Exception]

I am using Neo4j Procedure to create relationships on bulk data. 我正在使用Neo4j Procedure在批量数据上创建关系。

Initially insert that all data using load csv. 最初使用load csv插入所有数据。

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///XXXX.csv" AS row 
....

data size is too large[10M] but its successfully executed 数据大小太大[10M],但执行成功

my problem is i want to create relationships between this all nodes many-many 我的问题是我想在所有这些节点之间创建关系

but i got exception [OutMemoryException] while executing queries 但是执行查询时出现异常[OutMemoryException]

MATCH(n1:x{REMARKS :"LATEST"}) MATCH(n2:x{REMARKS :"LATEST"}) WHERE n1.DIST_ID=n2.ENROLLER_ID CREATE (n1)-[:ENROLLER]->(n2) ;

I have already created Indexing and Constraints also 我已经创建了索引和约束

Any idea please help me? 有什么想法请帮助我吗?

The problem is that your query is performed in one transaction, which leads to the exception [OutMemoryException] . 问题是您的查询是在一个事务中执行的,这会导致[OutMemoryException]异常。 And this is a problem, since at this moment the possibility of periodic transactions only have to load the CSV. 这是一个问题,因为此时此刻定期交易的可能性仅需加载CSV。 So, you can, for example, re-read the CSV after first load: 因此,例如,您可以在首次加载后重新读取CSV:

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///XXXX.csv" AS row 
MATCH (n1:x{REMARKS :"LATEST", DIST_ID: row.DIST_ID})
WITH n1
MATCH(n2:x{REMARKS :"LATEST"}) WHERE n1.DIST_ID=n2.ENROLLER_ID 
CREATE (n1)-[:ENROLLER]->(n2) ;

Or try the trick with periodic committing from the APOC library : 或者尝试通过APOC library 定期提交来解决问题:

call apoc.periodic.commit("
    MATCH (n2:x {REMARKS:'Latest'}) WHERE exists(n2.ENROLLER_ID)
    WITH n2 LIMIT {perCommit}
    OPTIONAL MATCH (n1:x {REMARKS:'Latest'}) WHERE n1.DIST_ID = n2.ENROLLER_ID
    WITH n2, collect(n1) as n1s
    FOREACH(n1 in n1s|
       CREATE (n1)-[:ENROLLER]->(n2)
    )
    REMOVE n2.ENROLLER_ID
    RETURN count(n2)", 
    {perCommit: 1000}
)

PS ENROLLER_ID property is used as a flag for selecting nodes for processing. PS ENROLLER_ID属性用作用于选择要处理的节点的标志。 Of course, you can use another flag, which is set in the processing. 当然,您可以使用在处理中设置的另一个标志。

Or a more accurate with apoc.periodic.iterate : 或更准确的说apoc.periodic.iterate

CALL apoc.periodic.iterate("
    MATCH (n1:x {REMARKS:'Latest'})
    MATCH (n2:x {REMARKS:'Latest'}) WHERE n1.DIST_ID = n2.ENROLLER_ID
    RETURN n1,n2
  ","
    WITH {n1} as n1, {n2} as n2 
    MERGE (n1)-[:ENROLLER]->(n2)
  ", {batchSize:10000, parallel:true}
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM