简体   繁体   English

在Google App Engine中的for循环中进行查询的有效方法?

[英]Efficient way to query in a for loop in Google App Engine?

In the GAE documentation, it states: 在GAE文档中,它指出:

Because each get() or put() operation invokes a separate remote procedure call (RPC), issuing many such calls inside a loop is an inefficient way to process a collection of entities or keys at once. 因为每个get()或put()操作都调用一个单独的远程过程调用(RPC),所以在循环内发出许多这样的调用是一次性处理实体或键集合的低效率方法。

Who knows how many other inefficiencies I have in my code, so I'd like to minimize as much as I can. 谁知道我的代码中还有多少其他效率低下的问题,所以我想尽可能地减少效率。 Currently, I do have a for loop where each iteration has a separate query. 目前,我确实有一个for循环,其中每个迭代都有一个单独的查询。 Let's say I have a User, and a user has friends. 假设我有一个用户,一个用户有朋友。 I want to get the latest updates for every friend of the user. 我想为用户的每个朋友获取最新更新。 So what I have is an array of that user's friends: 因此,我所拥有的是该用户好友的数组:

for friend_dic in friends:
        email = friend_dic['email']
        lastUpdated = friend_dic['lastUpdated']
        userKey = Key('User', email)
        query = ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated)
        qit = query.iter()
        while (yield qit.has_next_async()):
           status = qit.next()
           status_list.append(status.to_dict())
raise ndb.Return(status_list)

Is there a more efficient way to do this, maybe somehow batch all these into one single query? 有没有更有效的方法来执行此操作,也许以某种方式将所有这些批处理成一个查询?

Try looking at NDB's map function: https://developers.google.com/appengine/docs/python/ndb/queryclass#Query_map_async 尝试查看NDB的地图功能: https : //developers.google.com/appengine/docs/python/ndb/queryclass#Query_map_async

Example (assuming you keep your friend relationships in a separate model, for this example I assumed a Relationships model): 示例(假设您将朋友关系保留在单独的模型中,在本示例中,我假设使用“ Relationships模型):

@ndb.tasklet
def callback(entity):
  email = friend_dic['email']
  lastUpdated = friend_dic['lastUpdated']
  userKey = Key('User', email)
  query = ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated)
  status_updates = yield query.fetch_async()
  raise ndb.Return(status_updates)

qry = ndb.gql("SELECT * FROM Relationships WHERE friend_to = :1", user.key)
updates = yield qry.map_async(callback)
#updates will now be a list of status updates

Update: 更新:

With a better understanding of your data model: 更好地了解您的数据模型:

queries = []
status_list = []
for friend_dic in friends:
  email = friend_dic['email']
  lastUpdated = friend_dic['lastUpdated']
  userKey = Key('User', email)
  queries.append(ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated).fetch_async())

for query in queries:
  statuses = yield query
  status_list.extend([x.to_dict() for x in statuses])

raise ndb.Return(status_list)

You could perform those query concurrently using ndb async methods: 您可以使用ndb异步方法同时执行这些查询:

from google.appengine.ext import ndb

class Bar(ndb.Model):
   pass

class Foo(ndb.Model):
   pass

bars = ndb.put_multi([Bar() for i in range(10)])
ndb.put_multi([Foo(parent=bar) for bar in bars])

futures = [Foo.query(ancestor=bar).fetch_async(10) for bar in bars]
for f in futures:
  print(f.get_result())

This launches 10 concurrent Datastore Query RPCs, and the overall latency only depends of the slowest one instead of the sum of all latencies 这将启动10个并发的Datastore Query RPC,并且总体延迟仅取决于最慢的延迟,而不是所有延迟的总和

Also see the official ndb documentation for more detail on how to async APIs with ndb. 另请参阅官方的ndb文档,以获取有关如何使用ndb异步API的更多详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM