简体   繁体   中英

Limit and sort DynamoDB results with FilterExpression PYTHON

Hi can you please guide me how can I limit the number of results by using the filter expression . I cant use a key method and need to user a filter by attribute . Please review my code below and in the below code it returns me like 500 records in response but I need to get only 10 latest

   import json
import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('users')

def lambda_handler(event, context):
    response = table.scan(
        FilterExpression=Attr('age').eq('30') & Attr('created_at').lt('2020-01-01T00:00:00') & Attr('status').eq('enabled')
    )
    items = response['Items']
    #loadjson = json.stringify(event)
    #for data in event['data']:
    #    print(data['key'])
    return items   

According to the docs on Working With Scans , you can use the Limit parameter. However, this may not achieve what you expect.

Consider the following:

  • A scan operation fetches up to 1MB of data at a time (anything more will need to be paginated)
  • The Limit parameter sets the maximum number of items that you want the scan operation to return, prior to filter expression evaluation.
  • A FilterExpression determines which items within the Scan results should be returned to you. All of the other results are discarded.

The Important Part : A scan operation applies the Limit parameter first, followed by the FilterExpression , then returns the results.

Why is this important?

From the docs:

... suppose that you Scan a table with a Limit value of 6 and without a filter expression. The Scan result contains the first six items from the table.

Now suppose that you add a filter expression to the Scan. In this case, DynamoDB applies the filter expression to the six items that were returned, discarding those that do not match. The final Scan result contains six items or fewer, depending on the number of items that were filtered.

Let's take another look at your filter:

table.scan(
FilterExpression=Attr('age').eq('30') & Attr('created_at').lt('2020-01-01T00:00:00') & Attr('status').eq('enabled'))

This operation will read the first 1MB of data from your database, remove any items that do not match the filter, then return the results to you. If your DB is greater than 1MB in size, you'll need to continue paging through results until you've processed the entire database. Keep in mind, you may receive several empty result sets, since the FilterExpression is applied after the 1MB limit and before results are returned to you.

Probably not what you wanted, right?

Using the scan operation can often be a sign of an improperly designed data model. While there are some good uses of scan , this access pattern does not sound like one of them. You may want to take a closer look at your access pattern and create a secondary index that supports this operation. Otherwise, you'll be stuck with a more expensive (computationally and financially) scan operation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM