简体   繁体   中英

Simple example of retrieving 500 items from dynamodb using Python

Looking for a simple example of retrieving 500 items from dynamodb minimizing the number of queries. I know there's a "multiget" function that would let me break this up into chunks of 50 queries, but not sure how to do this.

I'm starting with a list of 500 keys. I'm then thinking of writing a function that takes this list of keys, breaks it up into "chunks," retrieves the values, stitches them back together, and returns a dict of 500 key-value pairs.

Or is there a better way to do this?

As a corollary, how would I "sort" the items afterwards?

Depending on you scheme, There are 2 ways of efficiently retrieving your 500 items.

1 Items are under the same hash_key , using a range_key

  • Use the query method with the hash_key
  • you may ask to sort the range_keys AZ or ZA

2 Items are on "random" keys

  • You said it: use the BatchGetItem method
  • Good news: the limit is actually 100/request or 1MB max
  • you will have to sort the results on the Python side.

On the practical side, since you use Python, I highly recommend the Boto library for low-level access or dynamodb-mapper library for higher level access (Disclaimer: I am one of the core dev of dynamodb-mapper).

Sadly, neither of these library provides an easy way to wrap the batch_get operation. On the contrary, there is a generator for scan and for query which 'pretends' you get all in a single query.

In order to get optimal results with the batch query, I recommend this workflow:

  • submit a batch with all of your 500 items.
  • store the results in your dicts
  • re-submit with the UnprocessedKeys as many times as needed
  • sort the results on the python side

Quick example

I assume you have created a table "MyTable" with a single hash_key

import boto

# Helper function. This is more or less the code
# I added to devolop branch
def resubmit(batch, prev):
    # Empty (re-use) the batch
    del batch[:]

    # The batch answer contains the list of
    # unprocessed keys grouped by tables
    if 'UnprocessedKeys' in prev:
        unprocessed = res['UnprocessedKeys']
    else:
        return None

    # Load the unprocessed keys
    for table_name, table_req in unprocessed.iteritems():
        table_keys = table_req['Keys']
        table = batch.layer2.get_table(table_name)

        keys = []
        for key in table_keys:
            h = key['HashKeyElement']
            r = None
            if 'RangeKeyElement' in key:
                r = key['RangeKeyElement']
            keys.append((h, r))

        attributes_to_get = None
        if 'AttributesToGet' in table_req:
            attributes_to_get = table_req['AttributesToGet']

        batch.add_batch(table, keys, attributes_to_get=attributes_to_get)

    return batch.submit()

# Main
db = boto.connect_dynamodb()
table = db.get_table('MyTable')
batch = db.new_batch_list()

keys = range (100) # Get items from 0 to 99

batch.add_batch(table, keys)

res = batch.submit()

while res:
    print res # Do some usefull work here
    res = resubmit(batch, res)

# The END

EDIT:

I've added a resubmit() function to BatchList in Boto develop branch. It greatly simplifies the worklow:

  1. add all of your requested keys to BatchList
  2. submit()
  3. resubmit() as long as it does not return None.

this should be available in next release.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM