full table scanning using boto3 python

Question

I'm trying to fully scan my table which contains more than 2 000 000 records on DynamoDB

Initially what i did was

import boto3
import pandas as pd
import json
from boto3.dynamodb.conditions import Key,Attr

dynamodb=boto3.resource('dynamodb')
table= dynamodb.Table('acloudapi_media_url_testing')
response = table.scan(
    FilterExpression=Attr('api_key').eq('xxxxxxxxxx'),
)
dict= response # the response is in the form of a dictionary
print(dict)

It printed out the result of more than 1000 records. But when i add the while loop below. It took so long/ couldnt complete the process .

import boto3
import pandas as pd
import json
from boto3.dynamodb.conditions import Key,Attr

dynamodb=boto3.resource('dynamodb')
table= dynamodb.Table('acloudapi_media_url_testing')
response = table.scan(
    FilterExpression=Attr('api_key').eq('graymatics_partners_ii'),
)
dict= response # the response is in the form of a dictionary
print(dict)

print(response['LastEvaluatedKey'])
while 'LastEvaluatedKey' in response:
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'],
                          FilterExpression=Attr('api_key').eq('xxxxxxxxxxi'))
    dict.update(response)
print(dict))

Can anyone advise me on what to do? I'm quite new to programming so my code will look quite noob

Answer 1

The key question is how many items available on your Dynamodb table?

I will just explain the above code especially the while loop that you have added lately.

The while loop is to scan all the items on DynamoDB table until there is no items to scan. One table scan will bring you only 1 MB of data. So, it has to be executed recursively while all the items are scanned.

while 'LastEvaluatedKey' in response:

If the total number of scanned items exceeds the maximum data set size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation.

Imagine, if you have millions of items in the table, the program has to scan all the items recursively before it ends. Also, the filter criteria is applied on the scan result set.

Due to the reasons mentioned above, mostly the table scan should be avoided on big tables. The scan process is normally inefficient and it would cost you as well.

Alternate solution - Use Global Secondary Index (GSI) and Query API

full table scanning using boto3 python

Question

1 answers

solution1
1 ACCPTED 2017-05-08 08:18:06

full table scanning using boto3 python

Question

1 answers

solution1 1 ACCPTED 2017-05-08 08:18:06

solution1
1 ACCPTED 2017-05-08 08:18:06