简体   繁体   English

在Google App Engine中进行大型IN查询的有效方法?

[英]Efficient way to do large IN query in Google App Engine?

A user accesses his contacts on his mobile device. 用户在他的移动设备上访问他的联系人。 I want to send back to the server all the phone numbers (say 250), and then query for any User entities that have matching phone numbers. 我想将所有电话号码(例如250)发送回服务器,然后查询具有匹配电话号码的任何用户实体。

A user has a phone field which is indexed. 用户具有索引的电话字段。 So I do User.query(User.phone.IN(phone_list)) , but I just looked at AppStats, and is this damn expensive. 所以我做User.query(User.phone.IN(phone_list)) ,但我只是看了AppStats,这真是太贵了。 It cost me 250 reads for this one operation, and this is something I expect a user to do often. 对于这一项操作,我花了250次读取,这是我希望用户经常做的事情。

What are some alternatives? 有哪些替代方案? I suppose I can set the User entity's id value to be his phone number (ie when creating a user I'd do user = User(id = phone_number) ), and then get directly by keys via ndb.get_multi(phones) , but I also want to perform this same query with emails too. 我想我可以将User实体的id值设置为他的电话号码(即创建用户时我会用user = User(id = phone_number) ),然后通过ndb.get_multi(phones)直接获取密钥,但是我也想用电子邮件执行相同的查询。

Any ideas? 有任何想法吗?

You could create a PhoneUser model like so: 您可以像这样创建一个PhoneUser模型:

from google.appengine.ext import ndb

class PhoneUser(ndb.Model):
  number = ndb.StringProperty()
  user = ndb.KeyProperty()

class User(ndb.Model):
  pass

u = User()
u.put()

p = PhoneUser(id='123-456-7890', number='123-456-7890', user=u.key)
p.put()

u2 = User()
u2.put()

p2 = PhoneUser(id='555-555-5555', number='555-555-5555', user=u2.key)

result =  ndb.get_multi([ndb.Key(PhoneUser, '123-456-7890'), ndb.Key(PhoneUser, '555-555-5555')])

I think that would work in this situation. 我认为在这种情况下会有效。 You would just have to add/delete your PhoneUser model whenever you update your User. 只要您更新用户,就必须添加/删除PhoneUser模型。 You can do this using post hooks: https://developers.google.com/appengine/docs/python/ndb/modelclass#Model__post_delete_hook 你可以使用post hooks来做到这一点: https//developers.google.com/appengine/docs/python/ndb/modelclass#Model__post_delete_hook

I misunderstood part of your problem, I thought you were issuing a query that was giving you 250 entities. 我误解了你的部分问题,我以为你发的是一个给你250个实体的查询。

I see what the problem is now, you're issuing an IN query with a list of 250 phone numbers, behind the scenes, the datastore is actually doing 250 individual queries, which is why you're getting 250 read ops. 我现在看到问题是什么,你发出了一个包含250个电话号码列表的IN查询,在幕后,数据存储区实际上正在进行250次单独查询,这就是为什么你要获得250个读取操作。

I can't think of a way to avoid this. 我想不出办法避免这种情况。 I'd recommend avoiding searching on long lists of phone numbers. 我建议不要搜索很长的电话号码列表。 This seems like something you'd need to do only once, the first time the user logs in using that phone. 这似乎是用户首次使用该手机登录时需要执行的操作。 Try to find some way to store the results and avoid the query again. 尝试找到一些方法来存储结果并再次避免查询。

there is no efficient way to do an IN query. 没有有效的方法来进行IN查询。 so instead avoid it all together. 所以反而避免一起。

how? 怎么样?

invert the query, instead of finding all people that belong to this guys phone list. 反转查询,而不是找到属于这个家伙电话列表的所有人。

try 尝试

finding all people that have this users phoneid in their list. 发现在他们的名单这个用户phoneid所有的人。

this however is not without some extra cost. 然而,这并非没有一些额外的费用。 the phonelist for each user much be stored and indexed. 每个用户的phonelist都被存储和索引。

class User(ndb.Model):
  phoneList = ndb.PropertyList()
  phone_id= ndb.StringProperty()

select from where User.phoneList = :this_phone_number

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM