简体   繁体   English

使用 FilterExpression PYTHON 限制和排序 DynamoDB 结果

[英]Limit and sort DynamoDB results with FilterExpression PYTHON

Hi can you please guide me how can I limit the number of results by using the filter expression .嗨,你能指导我如何使用过滤器表达式限制结果的数量。 I cant use a key method and need to user a filter by attribute .我不能使用关键方法,需要按属性来使用过滤器。 Please review my code below and in the below code it returns me like 500 records in response but I need to get only 10 latest请查看我下面的代码,在下面的代码中,它会返回 500 条记录作为响应,但我只需要获取 10 条最新记录

   import json
import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('users')

def lambda_handler(event, context):
    response = table.scan(
        FilterExpression=Attr('age').eq('30') & Attr('created_at').lt('2020-01-01T00:00:00') & Attr('status').eq('enabled')
    )
    items = response['Items']
    #loadjson = json.stringify(event)
    #for data in event['data']:
    #    print(data['key'])
    return items   

According to the docs on Working With Scans , you can use the Limit parameter.根据使用扫描的文档,您可以使用Limit参数。 However, this may not achieve what you expect.但是,这可能无法达到您的预期。

Consider the following:考虑以下:

  • A scan operation fetches up to 1MB of data at a time (anything more will need to be paginated)一次scan操作一次最多获取 1MB 的数据(更多的数据需要分页)
  • The Limit parameter sets the maximum number of items that you want the scan operation to return, prior to filter expression evaluation. Limit参数设置您希望scan操作在筛选表达式计算之前返回的最大项目数。
  • A FilterExpression determines which items within the Scan results should be returned to you. FilterExpression确定应将扫描结果中的哪些项目返回给您。 All of the other results are discarded.所有其他结果都被丢弃。

The Important Part : A scan operation applies the Limit parameter first, followed by the FilterExpression , then returns the results.重要部分:扫描操作首先应用Limit参数,然后应用FilterExpression ,然后返回结果。

Why is this important?为什么这很重要?

From the docs:从文档:

... suppose that you Scan a table with a Limit value of 6 and without a filter expression. ...假设您扫描了一个限制值为 6 且没有过滤器表达式的表。 The Scan result contains the first six items from the table.扫描结果包含表中的前六个项目。

Now suppose that you add a filter expression to the Scan.现在假设您向 Scan 添加了一个过滤器表达式。 In this case, DynamoDB applies the filter expression to the six items that were returned, discarding those that do not match.在这种情况下,DynamoDB 将过滤器表达式应用于返回的六个项目,丢弃那些不匹配的项目。 The final Scan result contains six items or fewer, depending on the number of items that were filtered.最终的扫描结果包含六个或更少的项目,具体取决于过滤的项目数量。

Let's take another look at your filter:让我们再看看你的过滤器:

table.scan(
FilterExpression=Attr('age').eq('30') & Attr('created_at').lt('2020-01-01T00:00:00') & Attr('status').eq('enabled'))

This operation will read the first 1MB of data from your database, remove any items that do not match the filter, then return the results to you.此操作将从您的数据库中读取前 1MB 的数据,删除与过滤器不匹配的所有项目,然后将结果返回给您。 If your DB is greater than 1MB in size, you'll need to continue paging through results until you've processed the entire database.如果您的数据库大小大于 1MB,您将需要继续对结果进行分页,直到处理完整个数据库。 Keep in mind, you may receive several empty result sets, since the FilterExpression is applied after the 1MB limit and before results are returned to you.请记住,您可能会收到几个空的结果集,因为1MB限制应用FilterExpression和之前结果返回给你。

Probably not what you wanted, right?可能不是你想要的,对吧?

Using the scan operation can often be a sign of an improperly designed data model.使用scan操作通常是数据模型设计不当的标志。 While there are some good uses of scan , this access pattern does not sound like one of them.虽然scan有一些很好的用途,但这种访问模式听起来不像其中之一。 You may want to take a closer look at your access pattern and create a secondary index that supports this operation.您可能需要仔细查看您的访问模式并创建支持此操作的二级索引。 Otherwise, you'll be stuck with a more expensive (computationally and financially) scan operation.否则,您将陷入更昂贵(在计算和财务上)的scan操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM