简体   繁体   English

批量插入/合并节点和边的正确方法

[英]Correct way to bulk insert / merge nodes and edges

I've been using neo4j with py2neo for a couple of weeks now, and up to now it was fine to just do single node transactions, so I would have different node types我已经将neo4jpy2neo一起使用了几个星期,到目前为止,只进行单节点事务就可以了,所以我会有不同的节点类型

class NodeA(GraphObject):
  ...

class NodeB(GraphObject):
  ...

# create some nodes from data and simply save them one by one
for data in dataset:
  node_a = NodeA(data)
  node_b = NodeB(data)

  if x:
    node_a.related_to_b.add(node_b)

  g.merge(node_b)
  g.merge(node_a)

Nothing fancy.没有什么花哨。 However, I'm starting to get more nodes and connections, and single transactions don't really work anymore, as expected.但是,我开始获得更多的节点和连接,而且单个事务不再像预期的那样有效。 I've been looking for ways to do bulk inserts, but can't find any good ressources.我一直在寻找进行批量插入的方法,但找不到任何好的资源。 The best I've managed to accomplish is using unwind_merge_nodes_query , which has two issues:我设法完成的最好的是使用unwind_merge_nodes_query ,它有两个问题:

  1. isn't that fast (~5 seconds for 700 very basic nodes on my laptop)不是那么快(我笔记本电脑上的 700 个非常基本的节点大约需要 5 秒)
  2. edges need to be handled separately边缘需要单独处理
  3. it requires keeping track of all the node ids to be able to handle edge connections它需要跟踪所有节点 ID 才能处理边缘连接

I've been writing functions to handle the above mentioned points, but I feel like I'm missing something and that there's a simpler way to handle batches of data我一直在编写函数来处理上述几点,但我觉得我遗漏了一些东西,而且有一种更简单的方法来处理批量数据

The unwind_merge_nodes_query function isn't generally intended to be used directly, although you can do so. unwind_merge_nodes_query function 通常不打算直接使用,尽管您可以这样做。 Usually, you'd want to use the functions from the py2neo.bulk module instead, which wrap these functions.通常,您希望改用py2neo.bulk模块中的函数,它们包装了这些函数。

Either way though, that nuance is unlikely to help much with your specific problems.不管怎样,这种细微差别不太可能对您的具体问题有太大帮助。 As a client-side library, py2neo can only carry out operations exposed by the Neo4j server and, unfortunately, there exists no good (low level) way to import non-trivial bulk data from the client.作为客户端库,py2neo 只能执行 Neo4j 服务器公开的操作,不幸的是,不存在从客户端导入重要批量数据的好(低级)方法。 Py2neo can't fix that. Py2neo 无法解决这个问题。

If performance is your goal, your best bet might be to instead use a LOAD CSV Cypher statement.如果性能是您的目标,您最好的选择可能是使用 LOAD CSV Cypher 语句。 Note though that to do this, your input data file will need to be on our visible to the server directly.请注意,尽管要执行此操作,您的输入数据文件将需要直接对服务器可见。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM