[英]Py2neo Neo4j Batch submit error
我有一个约有140万个节点数据的json文件,我想为此构建一个Neo4j图形数据库。 我试图使用py2neo的批处理提交功能。 我的代码如下:
# the variable words is a list containing node names
from py2neo import neo4j
batch = neo4j.WriteBatch(graph_db)
nodedict = {}
# I decided to use a dictionary because I would be creating relationships
# by referring to the dictionary entries later
for i in words:
nodedict[i] = batch.create({"name":i})
results = batch.submit()
显示的错误如下:
Traceback (most recent call last):
File "test.py", line 36, in <module>
results = batch.submit()
File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2116, in submit
for response in self._submit()
File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2085, in _submit
for id_, request in enumerate(self.requests)
File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 427, in _send
return self._client().send(request)
File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 364, in send
return Response(request.graph_db, rs.status, request.uri, rs.getheader("Loc$
File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 278, in __init__
raise SystemError(body)
SystemError: None
有人可以告诉我这里到底发生了什么吗? 它与批处理查询很大有关吗? 如果可以,该怎么办? 提前致谢! :)
因此,这就是我的想法(由于这个问题: py2neo-Neo4j-系统错误-创建批处理节点/关系 ):
py2neo批处理提交功能在可以进行的查询方面有其自身的局限性。 虽然无法获得确切的上限数量,但我尝试将每批查询的数量限制为5000。因此,我决定运行以下代码:
# the variable words is a list containing node names
from py2neo import neo4j
batch = neo4j.WriteBatch(graph_db)
nodedict = {}
# I decided to use a dictionary because I would be creating relationships
# by referring to the dictionary entries later
for index, i in enumerate(words):
nodedict[i] = batch.create({"name":i})
if index%5000 == 0:
batch.submit()
batch = neo4j.WriteBatch(graph_db) # As stated by Nigel below, I'm creating a new batch
batch.submit() #for the final batch
这样,我发送了批处理请求(大小为5k的查询),并成功地创建了整个图形!
没有真正的方法来描述批处理中可以包含的作业数量的限制-它会根据多种因素而千差万别。 通常,最好的选择是尝试为您的用例找到最佳尺寸,然后再选择最佳尺寸。 看来这就是您已经在做的:-)
根据您的解决方案,我建议您进行一项调整。 批处理对象的设计目的不是要重用,因此与其在每次提交后清除批处理,不如创建一个新的批处理对象。 无论如何,多次提交批处理的功能将在下一版py2neo中删除。
我开始通过graph.create(* alist)使用批处理创建后遇到了相同的问题。 上面的答案为我指明了正确的方向,我最终使用了受此问题启发的https://gist.github.com/anonymous/6293739的 摘要py2neo-Neo4j-系统错误-创建批处理节点/关系
chunk_size=500
chunks=(alist[pos:pos + chunk_size] for pos in xrange(0, len(alist), chunk_size))
for c in chunks:
graph.create(*c)
PS py2neo == 2.0.7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.