简体   繁体   中英

Efficient way to query in a for loop in Google App Engine?

In the GAE documentation, it states:

Because each get() or put() operation invokes a separate remote procedure call (RPC), issuing many such calls inside a loop is an inefficient way to process a collection of entities or keys at once.

Who knows how many other inefficiencies I have in my code, so I'd like to minimize as much as I can. Currently, I do have a for loop where each iteration has a separate query. Let's say I have a User, and a user has friends. I want to get the latest updates for every friend of the user. So what I have is an array of that user's friends:

for friend_dic in friends:
        email = friend_dic['email']
        lastUpdated = friend_dic['lastUpdated']
        userKey = Key('User', email)
        query = ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated)
        qit = query.iter()
        while (yield qit.has_next_async()):
           status = qit.next()
           status_list.append(status.to_dict())
raise ndb.Return(status_list)

Is there a more efficient way to do this, maybe somehow batch all these into one single query?

Try looking at NDB's map function: https://developers.google.com/appengine/docs/python/ndb/queryclass#Query_map_async

Example (assuming you keep your friend relationships in a separate model, for this example I assumed a Relationships model):

@ndb.tasklet
def callback(entity):
  email = friend_dic['email']
  lastUpdated = friend_dic['lastUpdated']
  userKey = Key('User', email)
  query = ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated)
  status_updates = yield query.fetch_async()
  raise ndb.Return(status_updates)

qry = ndb.gql("SELECT * FROM Relationships WHERE friend_to = :1", user.key)
updates = yield qry.map_async(callback)
#updates will now be a list of status updates

Update:

With a better understanding of your data model:

queries = []
status_list = []
for friend_dic in friends:
  email = friend_dic['email']
  lastUpdated = friend_dic['lastUpdated']
  userKey = Key('User', email)
  queries.append(ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated).fetch_async())

for query in queries:
  statuses = yield query
  status_list.extend([x.to_dict() for x in statuses])

raise ndb.Return(status_list)

You could perform those query concurrently using ndb async methods:

from google.appengine.ext import ndb

class Bar(ndb.Model):
   pass

class Foo(ndb.Model):
   pass

bars = ndb.put_multi([Bar() for i in range(10)])
ndb.put_multi([Foo(parent=bar) for bar in bars])

futures = [Foo.query(ancestor=bar).fetch_async(10) for bar in bars]
for f in futures:
  print(f.get_result())

This launches 10 concurrent Datastore Query RPCs, and the overall latency only depends of the slowest one instead of the sum of all latencies

Also see the official ndb documentation for more detail on how to async APIs with ndb.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM