批量加载neo4j

Question

I am batch loading a neo4j graph using py2neo using this script: 我正在使用以下脚本使用py2neo批量加载neo4j图：

batch = neo4j.WriteBatch(graph)
counter = 0
for each in ans:
    n1 = graph.merge_one("Page", "url", each[0])
#     batch.create(n1)
    counter +=1
    for linkvalue in each[6]:
        try:
            text,link = linkvalue.split('!__!')
            n2 = graph.merge_one("Page", "url", link)
#             batch.create(n2)
            counter+=1
            rel = Relationship(n1,'LINKS',n2, anchor_text=text)
            batch.create(rel)

        except (KeyboardInterrupt, SystemExit):
            print 'fail'
            raise

    if counter > 900:
        counter = 0
        batch.submit()
        print 'submit'
        batch = neo4j.WriteBatch(graph)

The merge_one's both make a call to the graph, which I believe is slowing down my algorithm. merge_one都调用该图，我认为这会使我的算法变慢。 I commented out the batch.create() because they were recreating the nodes. 我注释掉了batch.create（），因为它们正在重新创建节点。 Is there a way to do this function but save it until I batch.submit() to speed up the process? 有没有一种方法可以执行此功能，但是要保存它直到我batch.submit（）才能加快该过程？

I am handling about 50,000 nodes and 1,000,000 relationships. 我正在处理约50,000个节点和1,000,000个关系。

Answer 1

You need to append statements to the WriteBatch and then run the batch once it reaches some number of statements. 您需要将语句追加到WriteBatch ，然后在达到一定数量的语句后run批处理。

Here's an example: 这是一个例子：

import json
from py2neo.neo4j import CypherQuery, GraphDatabaseService, WriteBatch
from py2neo import neo4j

db = neo4j.GraphDatabaseService()

business_index_query = CypherQuery(db, "CREATE INDEX ON :Business(id)")
business_index_query.execute()

category_index_query = CypherQuery(db, "CREATE INDEX ON :Category(name)")
category_index_query.execute()

create_business_query = '''
    CREATE (b:Business {id: {business_id}, name: {name}, lat:{latitude}, 
    lon:{longitude}, stars: {stars}, review_count: {review_count}})
'''

merge_category_query = '''
    MATCH (b:Business {id: {business_id}})
    MERGE (c:Category {name: {category}})
    CREATE UNIQUE (c)<-[:IS_IN]-(b)
'''

print "Beginning business batch"
with open('data/yelp_academic_dataset_business.json', 'r') as f:
    business_batch = WriteBatch(db)
    count = 0
    for b in (json.loads(l) for l in f):
        business_batch.append_cypher(create_business_query, b)
        count += 1
        if count >= 10000:
            business_batch.run()
            business_batch.clear()
            count = 0
    if count > 0:
       business_batch.run()

print "Beginning category batch"
with open('data/yelp_academic_dataset_business.json', 'r') as f:
    category_batch = WriteBatch(db)
    count = 0
    for b in (json.loads(l) for l in f):
        for c in b['categories']:
            category_batch.append_cypher(merge_category_query, {'business_id': b['business_id'], 'category': c})
            count += 1
            if count >= 10000:
                category_batch.run()
                category_batch.clear()
                count = 0
    if count > 0:
        category_batch.run()

Note that this example uses only Cypher statements and appends each statement to the WriteBatch . 请注意，此示例仅使用Cypher语句，并将每个语句追加到WriteBatch 。 Also this example is using two different WriteBatch instances. 同样，此示例使用两个不同的WriteBatch实例。

批量加载neo4j

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-12-07 17:51:21

批量加载neo4j

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-12-07 17:51:21

解决方案1
1 已采纳 2015-12-07 17:51:21