简体   繁体   中英

How to use date filter correctly on aws dynamodb boto3

I want to retrieve items in a table in dynamodb. then i will add this data to below the last data of the table in big query.

client = boto3.client('dynamodb')
table = dynamodb.Table('table')
response = table.scan(FilterExpression=Attr('created_at').gt(max_date_of_the_table_in_big_query))

#first part
data = response['Items']

#second part
while response.get('LastEvaluatedKey'):
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    data.extend(response['Items'])

df=pd.DataFrame(data)
df=df[['query','created_at','result_count','id','isfuzy']]

# load df to big query
.....

the date filter working true but in while loop session (second part), the code retrieve all items. after first part, i have 100 rows. but after this code

while response.get('LastEvaluatedKey'):
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    data.extend(response['Items'])

i have 500.000 rows. i can use only first part. but i know there is a 1 mb limit, thats why i am using second part. how can i get data in given date range

Your 1st scan API call has a FilterExpression set, which applies your data filter:

response = table.scan(FilterExpression=Attr('created_at').gt(max_date_of_the_table_in_big_query))

However, the 2nd scan API call doesn't have one set and thus is not filtering your data:

response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])

Apply the FilterExpression to both calls:

while response.get('LastEvaluatedKey'):
    response = table.scan(
       ExclusiveStartKey=response['LastEvaluatedKey'], 
       FilterExpression=Attr('created_at').gt(max_date_of_the_table_in_big_query)    
    )
    data.extend(response['Items'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM