Simple example of retrieving 500 items from dynamodb using Python

Question

Looking for a simple example of retrieving 500 items from dynamodb minimizing the number of queries. I know there's a "multiget" function that would let me break this up into chunks of 50 queries, but not sure how to do this.

I'm starting with a list of 500 keys. I'm then thinking of writing a function that takes this list of keys, breaks it up into "chunks," retrieves the values, stitches them back together, and returns a dict of 500 key-value pairs.

Or is there a better way to do this?

As a corollary, how would I "sort" the items afterwards?

Answer 1

Depending on you scheme, There are 2 ways of efficiently retrieving your 500 items.

1 Items are under the same `hash_key` , using a `range_key`

Use the query method with the hash_key
you may ask to sort the range_keys AZ or ZA

2 Items are on "random" keys

You said it: use the BatchGetItem method
Good news: the limit is actually 100/request or 1MB max
you will have to sort the results on the Python side.

On the practical side, since you use Python, I highly recommend the Boto library for low-level access or dynamodb-mapper library for higher level access (Disclaimer: I am one of the core dev of dynamodb-mapper).

Sadly, neither of these library provides an easy way to wrap the batch_get operation. On the contrary, there is a generator for scan and for query which 'pretends' you get all in a single query.

In order to get optimal results with the batch query, I recommend this workflow:

submit a batch with all of your 500 items.
store the results in your dicts
re-submit with the UnprocessedKeys as many times as needed
sort the results on the python side

Quick example

I assume you have created a table "MyTable" with a single hash_key

import boto

# Helper function. This is more or less the code
# I added to devolop branch
def resubmit(batch, prev):
    # Empty (re-use) the batch
    del batch[:]

    # The batch answer contains the list of
    # unprocessed keys grouped by tables
    if 'UnprocessedKeys' in prev:
        unprocessed = res['UnprocessedKeys']
    else:
        return None

    # Load the unprocessed keys
    for table_name, table_req in unprocessed.iteritems():
        table_keys = table_req['Keys']
        table = batch.layer2.get_table(table_name)

        keys = []
        for key in table_keys:
            h = key['HashKeyElement']
            r = None
            if 'RangeKeyElement' in key:
                r = key['RangeKeyElement']
            keys.append((h, r))

        attributes_to_get = None
        if 'AttributesToGet' in table_req:
            attributes_to_get = table_req['AttributesToGet']

        batch.add_batch(table, keys, attributes_to_get=attributes_to_get)

    return batch.submit()

# Main
db = boto.connect_dynamodb()
table = db.get_table('MyTable')
batch = db.new_batch_list()

keys = range (100) # Get items from 0 to 99

batch.add_batch(table, keys)

res = batch.submit()

while res:
    print res # Do some usefull work here
    res = resubmit(batch, res)

# The END

EDIT:

I've added a resubmit() function to BatchList in Boto develop branch. It greatly simplifies the worklow:

add all of your requested keys to BatchList
submit()
resubmit() as long as it does not return None.

this should be available in next release.

Simple example of retrieving 500 items from dynamodb using Python

Question

1 answers

solution1
11 ACCPTED 2012-08-27 14:52:01

1 Items are under the same `hash_key` , using a `range_key`

2 Items are on "random" keys

Quick example

Simple example of retrieving 500 items from dynamodb using Python

Question

1 answers

solution1 11 ACCPTED 2012-08-27 14:52:01

1 Items are under the same hash_key , using a range_key

2 Items are on "random" keys

Quick example

solution1
11 ACCPTED 2012-08-27 14:52:01

1 Items are under the same `hash_key` , using a `range_key`