简体   繁体   English

批量加载neo4j

[英]Batch loading neo4j

I am batch loading a neo4j graph using py2neo using this script: 我正在使用以下脚本使用py2neo批量加载neo4j图:

batch = neo4j.WriteBatch(graph)
counter = 0
for each in ans:
    n1 = graph.merge_one("Page", "url", each[0])
#     batch.create(n1)
    counter +=1
    for linkvalue in each[6]:
        try:
            text,link = linkvalue.split('!__!')
            n2 = graph.merge_one("Page", "url", link)
#             batch.create(n2)
            counter+=1
            rel = Relationship(n1,'LINKS',n2, anchor_text=text)
            batch.create(rel)

        except (KeyboardInterrupt, SystemExit):
            print 'fail'
            raise

    if counter > 900:
        counter = 0
        batch.submit()
        print 'submit'
        batch = neo4j.WriteBatch(graph)

The merge_one's both make a call to the graph, which I believe is slowing down my algorithm. merge_one都调用该图,我认为这会使我的算法变慢。 I commented out the batch.create() because they were recreating the nodes. 我注释掉了batch.create(),因为它们正在重新创建节点。 Is there a way to do this function but save it until I batch.submit() to speed up the process? 有没有一种方法可以执行此功能,但是要保存它直到我batch.submit()才能加快该过程?

I am handling about 50,000 nodes and 1,000,000 relationships. 我正在处理约50,000个节点和1,000,000个关系。

You need to append statements to the WriteBatch and then run the batch once it reaches some number of statements. 您需要将语句追加到WriteBatch ,然后在达到一定数量的语句后run批处理。

Here's an example: 这是一个例子:

import json
from py2neo.neo4j import CypherQuery, GraphDatabaseService, WriteBatch
from py2neo import neo4j

db = neo4j.GraphDatabaseService()

business_index_query = CypherQuery(db, "CREATE INDEX ON :Business(id)")
business_index_query.execute()

category_index_query = CypherQuery(db, "CREATE INDEX ON :Category(name)")
category_index_query.execute()

create_business_query = '''
    CREATE (b:Business {id: {business_id}, name: {name}, lat:{latitude}, 
    lon:{longitude}, stars: {stars}, review_count: {review_count}})
'''

merge_category_query = '''
    MATCH (b:Business {id: {business_id}})
    MERGE (c:Category {name: {category}})
    CREATE UNIQUE (c)<-[:IS_IN]-(b)
'''

print "Beginning business batch"
with open('data/yelp_academic_dataset_business.json', 'r') as f:
    business_batch = WriteBatch(db)
    count = 0
    for b in (json.loads(l) for l in f):
        business_batch.append_cypher(create_business_query, b)
        count += 1
        if count >= 10000:
            business_batch.run()
            business_batch.clear()
            count = 0
    if count > 0:
       business_batch.run()

print "Beginning category batch"
with open('data/yelp_academic_dataset_business.json', 'r') as f:
    category_batch = WriteBatch(db)
    count = 0
    for b in (json.loads(l) for l in f):
        for c in b['categories']:
            category_batch.append_cypher(merge_category_query, {'business_id': b['business_id'], 'category': c})
            count += 1
            if count >= 10000:
                category_batch.run()
                category_batch.clear()
                count = 0
    if count > 0:
        category_batch.run()

Note that this example uses only Cypher statements and appends each statement to the WriteBatch . 请注意,此示例仅使用Cypher语句,并将每个语句追加到WriteBatch Also this example is using two different WriteBatch instances. 同样,此示例使用两个不同的WriteBatch实例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM