简体   繁体   English

使用py2neo失败WriteBatch操作

[英]Failed WriteBatch Operation with py2neo

I am trying to find a workaround to the following problem. 我试图找到解决以下问题的方法。 I have seen it quasi-described in this SO question , yet not really answered. 我已经看到它在这个SO问题中准备描述,但没有真正回答。

The following code fails, starting with a fresh graph: 以下代码失败,从新图开始:

from py2neo import neo4j

def add_test_nodes():
    # Add a test node manually
    alice = g.get_or_create_indexed_node("Users", "user_id", 12345, {"user_id":12345})

def do_batch(graph):
    # Begin batch write transaction
    batch = neo4j.WriteBatch(graph)

    # get some updated node properties to add
    new_node_data = {"user_id":12345, "name": "Alice"}

    # batch requests
    a = batch.get_or_create_in_index(neo4j.Node, "Users", "user_id", 12345, {})
    batch.set_properties(a, new_node_data)  #<-- I'm the problem

    # execute batch requests and clear
    batch.run()
    batch.clear()

if __name__ == '__main__':
    # Initialize Graph DB service and create a Users node index
    g = neo4j.GraphDatabaseService()
    users_idx = g.get_or_create_index(neo4j.Node, "Users")

    # run the test functions
    add_test_nodes()
    alice = g.get_or_create_indexed_node("Users", "user_id", 12345)
    print alice

    do_batch(g)

    # get alice back and assert additional properties were added
    alice = g.get_or_create_indexed_node("Users", "user_id", 12345)
    assert "name" in alice

In short, I wish, in one batch transaction, to update existing indexed node properties. 简而言之,我希望在一个批处理事务中更新现有的索引节点属性。 The failure is occurring at the batch.set_properties line, and it is because the BatchRequest object returned by the previous line is not being interpreted as a valid node. 失败发生在batch.set_properties行,这是因为前一行返回的BatchRequest对象未被解释为有效节点。 Though not entirely indentical, it feels like I am attempting something like the answer posted here 虽然不是完全同意,但感觉我正在尝试类似于此处发布的答案

Some specifics 一些细节

>>> import py2neo
>>> py2neo.__version__
'1.6.0'
>>> g = py2neo.neo4j.GraphDatabaseService()
>>> g.neo4j_version
(2, 0, 0, u'M06') 

Update 更新

If I split the problem into separate batches, then it can run without error: 如果我将问题分成不同的批次,那么它可以无错误地运行:

def do_batch(graph):
    # Begin batch write transaction
    batch = neo4j.WriteBatch(graph)

    # get some updated node properties to add
    new_node_data = {"user_id":12345, "name": "Alice"}

    # batch request 1
    batch.get_or_create_in_index(neo4j.Node, "Users", "user_id", 12345, {})

    # execute batch request and clear
    alice = batch.submit()
    batch.clear()

    # batch request 2
    batch.set_properties(a, new_node_data)

    # execute batch request and clear
    batch.run()
    batch.clear()

This works for many nodes as well. 这也适用于许多节点。 Though I do not love the idea of splitting the batch up, this might be the only way at the moment. 虽然我不喜欢分批的想法,但这可能是目前唯一的方法。 Anyone have some comments on this? 有人对此有何评论?

After reading up on all the new features of Neo4j 2.0.0-M06, it seems that the older workflow of node and relationship indexes are being superseded. 在阅读了Neo4j 2.0.0-M06的所有新功能后,似乎节点和关系索引的旧工作流程正在被取代。 There is presently a bit of a divergence on the part of neo in the way indexing is done. 在编制索引的方式中,neo目前存在一些分歧。 Namely, labels and schema indexes . 标签模式索引

Labels 标签

Labels can be arbitrarily attached to nodes and can serve as a reference for an index. 标签可以任意附加到节点,并可以作为索引的参考。

Indexes 索引

Indexes can be created in Cypher by referencing Labels (here, User ) and node property key, ( screen_name ): 可以通过引用标签(此处为User )和节点属性键( screen_name )在Cypher中创建索引:

CREATE INDEX ON :User(screen_name)

Cypher MERGE Cypher MERGE

Furthermore, the indexed get_or_create methods are now possible via the new cypher MERGE function, which incorporate Labels and their indexes quite succinctly: 此外,索引的get_or_create方法现在可以通过新的cypher MERGE函数实现,该函数非常简洁地包含了标签及其索引:

MERGE (me:User{screen_name:"SunPowered"}) RETURN me

Batch 批量

Queries of the sort can be batched in py2neo by appending a CypherQuery instance to the batch object: 通过将CypherQuery实例附加到批处理对象,可以在py2neo中对排序的查询进行批处理:

from py2neo import neo4j

graph_db = neo4j.GraphDatabaseService()
cypher_merge_user = neo4j.CypherQuery(graph_db, 
    "MERGE (user:User {screen_name:{name}}) RETURN user")

def get_or_create_user(screen_name):
    """Return the user if exists, create one if not"""
    return cypher_merge_user.execute_one(name=screen_name)

def get_or_create_users(screen_names):
    """Apply the get or create user cypher query to many usernames in a 
    batch transaction"""

    batch = neo4j.WriteBatch(graph_db)

    for screen_name in screen_names:
        batch.append_cypher(cypher_merge_user, params=dict(name=screen_name))

    return batch.submit()

root = get_or_create_user("Root")
users = get_or_create_users(["alice", "bob", "charlie"])

Limitation 局限性

There is a limitation, however, in that the results from a cypher query in a batch transaction cannot be referenced later in the same transaction. 但是,存在一个限制,即批处理事务中的密码查询的结果以后不能在同一事务中引用。 The original question was in reference to updating a collection of indexed user properties in one batch transaction. 最初的问题是在一个批处理事务中更新索引用户属性的集合。 This is still not possible, as far as I can muster. 就我而言,这仍然是不可能的。 For example, the following snippet throws an error: 例如,以下代码段会引发错误:

batch = neo4j.WriteBatch(graph_db)
b1 = batch.append_cypher(cypher_merge_user, params=dict(name="Alice"))
batch.set_properties(b1, dict(last_name="Smith")})
resp = batch.submit()

So, it seems that although there is a bit less overhead in implementing the get_or_create over a labelled node using py2neo because the legacy indexes are no longer necessary, the original question still needs 2 separate batch transactions to complete. 因此,似乎虽然使用get_or_create在标记节点上实现get_or_create开销稍微少py2neo因为不再需要遗留索引,但原始问题仍然需要完成2个单独的批处理事务。

Your problem seems not to be in batch.set_properties() but rather in the output of batch.get_or_create_in_index() . 您的问题似乎不在batch.set_properties() ,而是在batch.get_or_create_in_index()的输出中。 If you add the node with batch.create() , it works: 如果使用batch.create()添加节点,则可以:

db = neo4j.GraphDatabaseService()

batch = neo4j.WriteBatch(db)
# create a node instead of getting it from index
test_node = batch.create({'key': 'value'})
# set new properties on the node
batch.set_properties(test_node, {'key': 'foo'})

batch.submit()

If you have a look at the properties of the BatchRequest object returned by batch.create() and batch.get_or_create_in_index() there is a difference in the URI because the methods use different parts of the neo4j REST API: 如果查看batch.create()batch.get_or_create_in_index()返回的BatchRequest对象的属性,则URI存在差异,因为这些方法使用neo4j REST API的不同部分:

test_node = batch.create({'key': 'value'})
print test_node.uri # node
print test_node.body # {'key': 'value'}
print test_node.method # POST

index_node = batch.get_or_create_in_index(neo4j.Node, "Users", "user_id", 12345, {})
print index_node.uri # index/node/Users?uniqueness=get_or_create
print index_node.body # {u'value': 12345, u'key': 'user_id', u'properties': {}}
print index_node.method # POST

batch.submit()

So I guess batch.set_properties() somehow can't handle the URI of the indexed node? 所以我猜batch.set_properties()以某种方式无法处理索引节点的URI? Ie it doesn't really get the correct URI for the node? 即它没有真正获得节点的正确URI?

Doesn't solve the problem, but could be a pointer for somebody else ;) ? 不解决问题,但可能是其他人的指针;)?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM