简体   繁体   中英

Most efficient way to query and update dynamoDb

I have a dynamo DB table that will be used for storing failed requests, at a later time another lambda will want to read the requests and reprocess them.

At the minute I am creating the table like this using typescript CDK

const myTable = new dynamodb.Table(this, "my-table", {
      tableName: "my-table-name",
      partitionKey: { name: "file_id", type: dynamodb.AttributeType.STRING },
    });

I am sending data into the table like this in a python lambda

dynamodb = boto3.resource("dynamodb", region_name=region)
my_table = dynamodb.Table("my-table-name")

failedRecord = {
        "file_id": str(file_id),
        "processed": "false",
        "payload": str(payload),
    }

    my_table.put_item(Item=failedRecord)

Now what I want to do from another lambda is for all the entries in the table with processed = false I want read them, do something with them and then update their processed = true.

Do I need to add a secondary index here to be efficient. An example of how to do this would be great.

Thanks

Consider creating a global secondary index that contains only unprocessed items. You would add/remove items from the GSI by adding/removing the GSI Primary Key. For example, consider the following table structure:

在此处输入图片说明

Notice that only file_id 3 and 4 have a GSIPK defined. The GSI would logically look like this:

在此处输入图片说明

DynamoDB would only project items into the index where the GSIPK exists on that item. Your lambda could read from the GSI, do some work, set the processed attribute to true and remove the GSIPK value. This would effectively remove the item from the secondary index.

The update call to DynamoDB to do this would look something like this:

 const params = {
    TableName: YOUR_TABLE_NAME_HERE,
    Key: {
      PK: FILE_ID_HERE
    },
    UpdateExpression: "SET #processed = :true REMOVE #gsipk",
    ExpressionAttributeNames: {
      "#processed": "processed",
      "#gsi1pk": "GSIPK",
    },
    ExpressionAttributeValues: {
      ":true": true
    }
  };

  ddbClient.update(params);

Assuming that your filenote_id is already unique (it should given that you have set it as Partition Key), with the record format that you have shared and table schema a GSI without also adding a Sort key would not make any difference.

A different approach that you could consider is enabling DynamoDB Stream for the Table in question and set it as a trigger of the second Lambda Function . With this approach you'd be essentially capturing all activities on the table and in your logic you could filter out all events that are not INSERT and process the ones you are interested into at your own pace.

This way you'd avoid querying the table entirely.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM