简体   繁体   中英

Efficient way to do large IN query in Google App Engine?

A user accesses his contacts on his mobile device. I want to send back to the server all the phone numbers (say 250), and then query for any User entities that have matching phone numbers.

A user has a phone field which is indexed. So I do User.query(User.phone.IN(phone_list)) , but I just looked at AppStats, and is this damn expensive. It cost me 250 reads for this one operation, and this is something I expect a user to do often.

What are some alternatives? I suppose I can set the User entity's id value to be his phone number (ie when creating a user I'd do user = User(id = phone_number) ), and then get directly by keys via ndb.get_multi(phones) , but I also want to perform this same query with emails too.

Any ideas?

You could create a PhoneUser model like so:

from google.appengine.ext import ndb

class PhoneUser(ndb.Model):
  number = ndb.StringProperty()
  user = ndb.KeyProperty()

class User(ndb.Model):
  pass

u = User()
u.put()

p = PhoneUser(id='123-456-7890', number='123-456-7890', user=u.key)
p.put()

u2 = User()
u2.put()

p2 = PhoneUser(id='555-555-5555', number='555-555-5555', user=u2.key)

result =  ndb.get_multi([ndb.Key(PhoneUser, '123-456-7890'), ndb.Key(PhoneUser, '555-555-5555')])

I think that would work in this situation. You would just have to add/delete your PhoneUser model whenever you update your User. You can do this using post hooks: https://developers.google.com/appengine/docs/python/ndb/modelclass#Model__post_delete_hook

I misunderstood part of your problem, I thought you were issuing a query that was giving you 250 entities.

I see what the problem is now, you're issuing an IN query with a list of 250 phone numbers, behind the scenes, the datastore is actually doing 250 individual queries, which is why you're getting 250 read ops.

I can't think of a way to avoid this. I'd recommend avoiding searching on long lists of phone numbers. This seems like something you'd need to do only once, the first time the user logs in using that phone. Try to find some way to store the results and avoid the query again.

there is no efficient way to do an IN query. so instead avoid it all together.

how?

invert the query, instead of finding all people that belong to this guys phone list.

try

finding all people that have this users phoneid in their list.

this however is not without some extra cost. the phonelist for each user much be stored and indexed.

class User(ndb.Model):
  phoneList = ndb.PropertyList()
  phone_id= ndb.StringProperty()

select from where User.phoneList = :this_phone_number

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM