Google App Engine NDB读取操作优化

Question

I'm looking to optimize my read operations in my GAE python app. 我想在我的GAE python应用程序中优化读取操作。 I don't want to go over my free quota. 我不想超过我的免费配额。 I'm basically storing data every so often. 我基本上是经常存储数据。 A lot of the data i'm getting might be duplicated so i have to check it before i store it. 我要获取的许多数据可能会重复，因此在存储之前必须进行检查。 This results in a lot of read ops and some write ops. 这导致大量的读取操作和一些写入操作。 Here is how i'm doing it now: 这是我现在的做法：

#data is a JSON data list with hundreds of items 
for item in data:
  record = InfoDB.get_by_id(item['id'])
  if record:
     continue 
  else:
     entity = InfoDB(id=item['id'], data=item['data']).put()

Here is one way i thought of lowering the read ops. 这是我想降低读取操作的一种方法。 Though i'm not 100% sure if that's true. 虽然我不是100％确定那是真的。 I'm thinking it may perform a read op every time the loop iterates. 我想它可能在每次循环迭代时执行一次读操作。

#data is a JSON data list with hundreds of items
flag = False
db = InfoDB.query().fetch()
for item in data:
  for record in db:
    if record.id == item.id:
      flag = True

  if flag is True:
    continue
  else:
    entity = InfoDB(id=item['id'], data=item['data']).put()

Is the above method actually saving me read operations since it's essentially just grabbing the entire datastore and then using a for loop to process the entire set every iteration? 上面的方法实际上是在保存我的读操作，因为它实际上只是在抓取整个数据存储区，然后使用for循环在每次迭代时处理整个集？ I realize this is slower but i don't see how else i could accomplish this efficiently. 我意识到这比较慢，但是我不知道我还能如何高效地完成这项工作。

Any other ideas? 还有其他想法吗？

EDIT: 编辑：

Just to clarify, this is using NDB. 只是为了澄清，这是使用NDB。 Not the older DB. 不是较旧的数据库。

Answer 1

If you know all the keys, do a entities = db.get([list of keys]) or entities = ndb.get_multi([list of keys]) - which from your sample you do know all the id's. 如果您知道所有密钥，请执行一个entities = db.get([list of keys])或entities = ndb.get_multi([list of keys]) -从您的样本中您确实知道所有ID。

This is far more efficient. 这样效率更高。

Then do a db.put(entities) or ndb.put_multi(entities) 然后执行db.put(entities)或ndb.put_multi(entities)

Answer 2

Your proposed method will result in many more read operations, not less, because now you read all entities, whether you need them or not. 提议的方法将导致更多的读取操作，而不是更少，因为现在您读取所有实体，无论是否需要它们。

This is how you can optimize it, if you can override the existing entities: 如果您可以覆盖现有实体，可以通过以下方法对其进行优化：

for item in data:
    InfoDB(id=item['id'], data=item['data']).put()

If you cannot override the existing entities, you should use a keys-only query : 如果您无法覆盖现有实体，则应使用仅键查询：

for key in query.iter(keys_only=True):

Keys-only queries are now free, as opposed to fetching complete entities. 现在，仅键查询是免费的，而不是获取完整的实体。

Google App Engine NDB读取操作优化

问题描述

2 个解决方案

解决方案1
2 2014-04-15 04:17:37

解决方案2
1 已采纳 2014-04-15 00:10:42

Google App Engine NDB读取操作优化

问题描述

2 个解决方案

解决方案1 2 2014-04-15 04:17:37

解决方案2 1 已采纳 2014-04-15 00:10:42

解决方案1
2 2014-04-15 04:17:37

解决方案2
1 已采纳 2014-04-15 00:10:42