简体   繁体   English

Py2neo Neo4j批量提交错误

[英]Py2neo Neo4j Batch submit error

I have a json file with data of around 1.4 million nodes and I wanted to construct a Neo4j graph database for that. 我有一个约有140万个节点数据的json文件,我想为此构建一个Neo4j图形数据库。 I tried to use py2neo's batch submit function. 我试图使用py2neo的批处理提交功能。 My code is as follows: 我的代码如下:

# the variable words is a list containing node names
from py2neo import neo4j
batch = neo4j.WriteBatch(graph_db)
nodedict = {}
# I decided to use a dictionary because I would be creating relationships
# by referring to the dictionary entries later
for i in words:
    nodedict[i] = batch.create({"name":i})
results = batch.submit()

The error shown is as follows: 显示的错误如下:

Traceback (most recent call last):
  File "test.py", line 36, in <module>
    results = batch.submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2116, in submit
    for response in self._submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2085, in _submit
    for id_, request in enumerate(self.requests)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 427, in _send
    return self._client().send(request)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 364, in send
    return Response(request.graph_db, rs.status, request.uri, rs.getheader("Loc$
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 278, in __init__
    raise SystemError(body)
SystemError: None

Can anybody please tell me what exactly is happening here? 有人可以告诉我这里到底发生了什么吗? Does it have anything to do with the fact that the batch query is pretty large? 它与批处理查询很大有关吗? If so, what can be done? 如果可以,该怎么办? Thanks in advance! 提前致谢! :) :)

So here's what I figured out (Thanks to this question: py2neo - Neo4j - System Error - Create Batch Nodes/Relationships ): 因此,这就是我的想法(由于这个问题: py2neo-Neo4j-系统错误-创建批处理节点/关系 ):

The py2neo batch submit function has it's own limitations in terms of queries that can be made. py2neo批处理提交功能在可以进行的查询方面有其自身的局限性。 While, I wasn't able to get a exact amount on the upper limit, I tried to limit my number of queries per batch to 5000. So I decided to run the following piece of code: 虽然无法获得确切的上限数量,但我尝试将每批查询的数量限制为5000。因此,我决定运行以下代码:

# the variable words is a list containing node names
from py2neo import neo4j
batch = neo4j.WriteBatch(graph_db)
nodedict = {}
# I decided to use a dictionary because I would be creating relationships
# by referring to the dictionary entries later

for index, i in enumerate(words):
    nodedict[i] = batch.create({"name":i})
    if index%5000 == 0:
        batch.submit()
        batch = neo4j.WriteBatch(graph_db) # As stated by Nigel below, I'm creating a new batch
batch.submit() #for the final batch

This way, I sent batch requests (of size 5k queries) and was successfully able to get my entire graph created! 这样,我发送了批处理请求(大小为5k的查询),并成功地创建了整个图形!

There's no real way to describe a limit on the number of jobs that a batch can contain - it can vary wildly based on a number of factors. 没有真正的方法来描述批处理中可以包含的作业数量的限制-它会根据多种因素而千差万别。 The best bet in general is to experiment to find an optimum size for your use case and go with that. 通常,最好的选择是尝试为您的用例找到最佳尺寸,然后再选择最佳尺寸。 It looks like this is what you are already doing :-) 看来这就是您已经在做的:-)

In terms of your solution, I'd recommend one tweak. 根据您的解决方案,我建议您进行一项调整。 Batch objects weren't designed to be reused so instead of clearing the batch after every submission, simply create a new one. 批处理对象的设计目的不是要重用,因此与其在每次提交后清除批处理,不如创建一个新的批处理对象。 The ability to submit a batch multiple times will be removed in the next version of py2neo anyway. 无论如何,多次提交批处理的功能将在下一版py2neo中删除。

I had the same issue after I started using batch create via graph.create(*alist). 我开始通过graph.create(* alist)使用批处理创建后遇到了相同的问题。 The above answers pointed me in the right direction and I ended up using this snippet inspired by https://gist.github.com/anonymous/6293739 from this question py2neo - Neo4j - System Error - Create Batch Nodes/Relationships 上面的答案为我指明了正确的方向,我最终使用了受此问题启发的https://gist.github.com/anonymous/6293739的 摘要py2neo-Neo4j-系统错误-创建批处理节点/关系

chunk_size=500
chunks=(alist[pos:pos + chunk_size] for pos in xrange(0, len(alist), chunk_size))
for c in chunks:
    graph.create(*c)

PS py2neo==2.0.7 PS py2neo == 2.0.7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM