简体   繁体   English

查询和更新 dynamoDb 的最有效方法

[英]Most efficient way to query and update dynamoDb

I have a dynamo DB table that will be used for storing failed requests, at a later time another lambda will want to read the requests and reprocess them.我有一个 dynamo DB 表,用于存储失败的请求,稍后另一个 lambda 会想要读取请求并重新处理它们。

At the minute I am creating the table like this using typescript CDK此刻我正在使用打字稿 CDK 创建这样的表

const myTable = new dynamodb.Table(this, "my-table", {
      tableName: "my-table-name",
      partitionKey: { name: "file_id", type: dynamodb.AttributeType.STRING },
    });

I am sending data into the table like this in a python lambda我在 python lambda 中像这样将数据发送到表中

dynamodb = boto3.resource("dynamodb", region_name=region)
my_table = dynamodb.Table("my-table-name")

failedRecord = {
        "file_id": str(file_id),
        "processed": "false",
        "payload": str(payload),
    }

    my_table.put_item(Item=failedRecord)

Now what I want to do from another lambda is for all the entries in the table with processed = false I want read them, do something with them and then update their processed = true.现在我想从另一个 lambda 中对表中的所有条目进行处理 = false 我想读取它们,对它们做一些事情,然后更新它们的处理 = true。

Do I need to add a secondary index here to be efficient.我是否需要在此处添加二级索引以提高效率。 An example of how to do this would be great.如何做到这一点的例子会很棒。

Thanks谢谢

Consider creating a global secondary index that contains only unprocessed items.考虑创建一个包含未处理项目的全局二级索引。 You would add/remove items from the GSI by adding/removing the GSI Primary Key.您可以通过添加/删除 GSI 主键来添加/删除 GSI 中的项目。 For example, consider the following table structure:例如,考虑以下表结构:

在此处输入图片说明

Notice that only file_id 3 and 4 have a GSIPK defined.请注意,只有file_id 3 和 4 定义了 GSIPK。 The GSI would logically look like this: GSI 在逻辑上看起来像这样:

在此处输入图片说明

DynamoDB would only project items into the index where the GSIPK exists on that item. DynamoDB 只会将项目投影到该项目上存在 GSIPK 的索引中。 Your lambda could read from the GSI, do some work, set the processed attribute to true and remove the GSIPK value.你的 lambda 可以从 GSI 读取,做一些工作,将processed属性设置为true并删除GSIPK值。 This would effectively remove the item from the secondary index.这将有效地从二级索引中删除该项目。

The update call to DynamoDB to do this would look something like this:为此对 DynamoDB 的update调用如下所示:

 const params = {
    TableName: YOUR_TABLE_NAME_HERE,
    Key: {
      PK: FILE_ID_HERE
    },
    UpdateExpression: "SET #processed = :true REMOVE #gsipk",
    ExpressionAttributeNames: {
      "#processed": "processed",
      "#gsi1pk": "GSIPK",
    },
    ExpressionAttributeValues: {
      ":true": true
    }
  };

  ddbClient.update(params);

Assuming that your filenote_id is already unique (it should given that you have set it as Partition Key), with the record format that you have shared and table schema a GSI without also adding a Sort key would not make any difference.假设您的filenote_id已经是唯一的(应该假设您已将其设置为分区键),使用您共享的记录格式和表架构 GSI 而不添加排序键不会有任何区别。

A different approach that you could consider is enabling DynamoDB Stream for the Table in question and set it as a trigger of the second Lambda Function .您可以考虑的另一种方法是为相关表启用DynamoDB Stream并将其设置为第二个 Lambda 函数触发器 With this approach you'd be essentially capturing all activities on the table and in your logic you could filter out all events that are not INSERT and process the ones you are interested into at your own pace.使用这种方法,您基本上可以捕获表上的所有活动,并且在您的逻辑中,您可以过滤掉所有不是INSERT事件,并按照自己的节奏处理您感兴趣的事件。

This way you'd avoid querying the table entirely.这样你就可以避免完全查询表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM