简体   繁体   English

Google AppEngine - 大数据存储区读取

[英]Google AppEngine - Big datastore reads

I need to read all the entries in a Google AppEngine datastore to do some initialization work. 我需要阅读Google AppEngine数据存储区中的所有条目才能进行一些初始化工作。 There are a lot of entities (80k currently) and this continues to grow. 有很多实体(目前为80k)并且这种情况继续增长。 I'm starting to hit the 30 second datastore query timeout limit. 我开始达到30秒的数据存储区查询超时限制。

Are there any best practices for how to shard these types of huge reads in the datastore? 是否有关于如何在数据存储区中对这些类型的大型读取进行分片的最佳实践? Any examples? 任何例子?

You can tackle this in several ways: 您可以通过以下几种方式解决这个问题:

  1. Execute your code on Task Queue which has 10min timeout instead of 30s (more like 60s in practice). Task Queue上执行你的代码,它有10分钟超时而不是30秒(实际上更像是60秒)。 The easiest way to do this is via DeferredTask . 最简单的方法是通过DeferredTask

    Warning : DeferredTask must be serializable, so it's hard to pass it complex data. 警告 :DeferredTask必须是可序列化的,因此很难传递复杂的数据。 Also dont make it an inner class. 也不要让它成为一个内在的阶级。

  2. See backends . 后端 Requests served by backend instance do not have time limit. 后端实例提供的请求没有时间限制。

  3. Finally, if you need to break-up a big task and execute in parallel than look at mapreduce . 最后,如果你需要分解一个大任务并且并行执行而不是看mapreduce

This answer on StackExchange served me well: StackExchange上的这个答案很好地为我服务:

Expired queries and appengine 过期的查询和appengine

I had to slightly modify it to work for me: 我不得不稍微修改它以适合我:

def loop_over_objects_in_batches(batch_size, object_class, callback):

    num_els = object_class.count() 
    num_loops = num_els / batch_size
    remainder = num_els - num_loops * batch_size
    logging.info("Calling batched loop with batch_size: %d, num_els: %s, num_loops: %s, remainder: %s, object_class: %s, callback: %s," % (batch_size, num_els, num_loops, remainder, object_class, callback))    
    offset = 0
    while offset < num_loops * batch_size:
        logging.info("Processing batch (%d:%d)" % (offset, offset+batch_size))
        query = object_class[offset:offset + batch_size]
        for q in query:
            callback(q)

        offset = offset + batch_size

    if remainder:
        logging.info("Processing remainder batch (%d:%d)" % (offset, num_els))
        query = object_class[offset:num_els]
        for q in query:
            callback(q)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM