简体   繁体   中英

Batch write more than 25 items on DynamoDB using Lambda

Edit x1: Replaced the snippet with the full file

I'm currently in the process of seeding 1.8K rows in DynamoDB. When a user is created, these rows need to be generated and inserted. They don't need to be read immediately (Let's say, in less then 3 - 5 seconds). I'm currently using AWS Lambda and I'm getting hit by a timeout exception (Probably because more WCUs are consumed than provisioned, which I have 5 with Auto-Scaling disabled).

I've tried searching around Google and StackOverflow and this seems to be a gray area (which is kind of strange, considering that DynamoDB is marketed as an incredible solution handling massive amounts of data per second) in which no clear path exists.

We know that DynamoDB limits the inserts of 25 items per batch to prevent HTTP overhead. Meaning that we could call unlimited number of batchWrite and increase the WCUs.

I've tried calling the unlimited number of batchWrite by just firing it and not awaiting them (Will this count? I've read that since JS is single threaded the requests will be handled one by one anyways, except that I wouldn't have to wait the resposne if I don't use a promise.... Currently using Node 10 and Lambda), and nothing seems to happen. If I promisify the call and await it, I'd get a Lambda timeout exception (Probably because it ran out of WCUs).

I currently have 5 WCUs and 5RCUs (are these too small for these random-spiked operations?).

I'm kind of stuck as I don't want to be randomly increasing the WCUs for short periods of time. In addition, I've read that Auto-Scaling doesn't automatically kick in, and Amazon will only resize the Capacity Units 4 times a day.

What should I do about it?

Here's the full file what I'm using to insert into DynamoDB

const aws = require("aws-sdk");

export async function batchWrite(
  data: {
    PutRequest: {
      Item: any;
    };
  }[]
) {
  const client = new aws.DynamoDB.DocumentClient({
    region: "us-east-2"
  });
  // 25 is the limit imposed by DynamoDB's batchWrite:
  // Member must have length less than or equal to 25.
  // This verifies whether the data is shaped correctly and has no duplicates.
  const sortKeyList: string[] = [];
  data.forEach((put, index) => {
    const item = put.PutRequest.Item;
    const has = Object.prototype.hasOwnProperty; // cache the lookup once, in module scope.
    const hasPk = has.call(item, "pk");
    const hasSk = has.call(item, "sk");
    // Checks if it doesn't have a sort key. Unless it's a tenant object, which has
    // the accountType attribute.
    if (!hasPk || !hasSk) {
      throw `hasPk is ${hasPk} and hasSk is ${hasSk} at index ${index}`;
    }

    if (typeof item["pk"] !== "string" || typeof item["sk"] !== "string") {
      throw `Item at index ${index} pk or sk is not a string`;
    }

    if (sortKeyList.indexOf(item.sk) !== -1) {
      throw `The item @ index ${index} and sortkey ${item.sk} has duplicate values`;
    }

    if (item.sk.indexOf("undefined") !== -1) {
      throw `There's an undefined in the sortkey ${index} and ${item.sk}`;
    }

    sortKeyList.push(put.PutRequest.Item.sk);
  });

  // DynamoDB only accepts 25 items at a time.
  for (let i = 0; i < data.length; i += 25) {
    const upperLimit = Math.min(i + 25, data.length);
    const newItems = data.slice(i, upperLimit);
    try {
      await client
        .batchWrite({
          RequestItems: {
            schon: newItems
          }
        })
        .promise();
    } catch (e) {
      console.log("Total Batches: " + Math.ceil(data.length / 25));
      console.error("There was an error while processing the request");
      console.log(e.message);
      console.log("Total data to insert", data.length);
      console.log("New items is", newItems);
      console.log("index is ", i);
      console.log("top index is", upperLimit);
      break;
    }
  }
  console.log(
    "If no errors are shown, creation in DynamoDB has been successful"
  );
}

There are two issues that you're facing but I'll attempt to address them.

A full example of the items being written and the actual batchWrite request with the items shown has not been provided, so it is unclear if the actual request is properly formatted. Based on the information provided, and the issue being faced, it appears that the request is not correctly formatted.

The documentation for the batchWrite operation in the AWS Javascript SDK can be found here , and a previous answer here shows a solution for correctly building and formatting a batchWrite request.

Nonetheless, even if the request is formatted correctly, there still exists a second issue which is that there is sufficient capacity provisioned to handle the write requests to insert 1800 records in the required amount of time which has an upper limit of 5 seconds.

TL;DR the quick and easy solution to the capacity issue is to switch from Provisioned Capacity to On Demand capacity. As is shown below, the math indicates that unless you have consistent and/or predictable capacity requirements, most of the time On Demand capacity is going to not only remove the management overhead of provisioned capacity, but it's also going to be substantially less expensive.

As per the AWS DynamoDB documentation for provisioned capacity here , a Write Capacity Unit or WCU is billed, and thus defined, as follows:

Each API call to write data to your table is a write request. For items up to 1 KB in size, one WCU can perform one standard write request per second.

The AWS documentation for the batchWrite / batchWriteItem API here indicates that a batchWrite API request supports up to 25 items per request and individual items can be up to 400kb. Further to this, the number of WCU's required to process the batchWrite request depends on the size of the items in the request. The AWS documentation for managing capacity in DynamoDB here , advises the number of WCU's required to process a batchWrite request is calculated as follows:

BatchWriteItem — Writes up to 25 items to one or more tables. DynamoDB processes each item in the batch as an individual PutItem or DeleteItem request (updates are not supported). So DynamoDB first rounds up the size of each item to the next 1 KB boundary, and then calculates the total size. The result is not necessarily the same as the total size of all the items. For example, if BatchWriteItem writes a 500-byte item and a 3.5 KB item, DynamoDB calculates the size as 5 KB (1 KB + 4 KB), not 4 KB (500 bytes + 3.5 KB).

The size of the items in the batchWrite request has not been provided, but for the sake of this answer the assumption is made that they are <1KB each. With 25 items of <1KB each in the request, a minimum Provisioned Capacity of 25 WCU's is required to process a single batchWrite request per second . Assuming that the minimum 25 required WCU's are provisioned, considering the 5 second time limit on inserting the items, with just 25 WCU's provisioned, only one request with 25 items can be made per second which totals 125 items inserted in the 5 second time limit. Based on this, in order to achieve the goal of inserting 1800 items in 5 seconds 360 WCU's are need to achieve the goal.

Based on the current pricing for Provisioned Capacity found here , 360 WCU's of provisioned capacity would have a cost of approximately $175/month (not considering free tier credits).

There are two options for how you can handle this issue

  1. Increase provisioned capacity. To achieve 1800 items in 5 seconds you're going to need to provision 360 WCU's.
  2. The better option is to simply switch to On Demand capacity. The question mentioned that the write requests are “random-spiked operations”. If write requests are not predictable and consistent operations on a table, then the outcome is often over provisioning of the table and paying for idle capacity. “On Demand” capacity solves this and adheres to the Serverless philosophy of only paying for what you use where you are only billed for what you consume. Currently, on demand pricing is $1.25 / 1 million WCU's consumed. Based on this, if every new user is generating 1800 new items to be inserted, it would take 97,223 new users being created per month, before provisioning capacity for the table is competitive vs using on demand capacity. Put another way, until a new user is being registered on-average every 26 seconds, the math suggests sticking with on-demand capacity (worth noting that this does not consider RCU's or other items in the table or other access patterns).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM