In the GAE documentation, it states:
Because each get() or put() operation invokes a separate remote procedure call (RPC), issuing many such calls inside a loop is an inefficient way to process a collection of entities or keys at once.
Who knows how many other inefficiencies I have in my code, so I'd like to minimize as much as I can. Currently, I do have a for loop where each iteration has a separate query. Let's say I have a User, and a user has friends. I want to get the latest updates for every friend of the user. So what I have is an array of that user's friends:
for friend_dic in friends:
email = friend_dic['email']
lastUpdated = friend_dic['lastUpdated']
userKey = Key('User', email)
query = ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated)
qit = query.iter()
while (yield qit.has_next_async()):
status = qit.next()
status_list.append(status.to_dict())
raise ndb.Return(status_list)
Is there a more efficient way to do this, maybe somehow batch all these into one single query?
Try looking at NDB's map function: https://developers.google.com/appengine/docs/python/ndb/queryclass#Query_map_async
Example (assuming you keep your friend relationships in a separate model, for this example I assumed a Relationships
model):
@ndb.tasklet
def callback(entity):
email = friend_dic['email']
lastUpdated = friend_dic['lastUpdated']
userKey = Key('User', email)
query = ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated)
status_updates = yield query.fetch_async()
raise ndb.Return(status_updates)
qry = ndb.gql("SELECT * FROM Relationships WHERE friend_to = :1", user.key)
updates = yield qry.map_async(callback)
#updates will now be a list of status updates
With a better understanding of your data model:
queries = []
status_list = []
for friend_dic in friends:
email = friend_dic['email']
lastUpdated = friend_dic['lastUpdated']
userKey = Key('User', email)
queries.append(ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated).fetch_async())
for query in queries:
statuses = yield query
status_list.extend([x.to_dict() for x in statuses])
raise ndb.Return(status_list)
You could perform those query concurrently using ndb async methods:
from google.appengine.ext import ndb
class Bar(ndb.Model):
pass
class Foo(ndb.Model):
pass
bars = ndb.put_multi([Bar() for i in range(10)])
ndb.put_multi([Foo(parent=bar) for bar in bars])
futures = [Foo.query(ancestor=bar).fetch_async(10) for bar in bars]
for f in futures:
print(f.get_result())
This launches 10 concurrent Datastore Query RPCs, and the overall latency only depends of the slowest one instead of the sum of all latencies
Also see the official ndb documentation for more detail on how to async APIs with ndb.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.