简体   繁体   中英

DynamoDB - very slow write operations

I have a DynamoDB running in AWS cloud and I am populating it with data on regular (scheduled) bases. Basically, once every hour, I receive a file that needs to be processed and the results has to be saved in the database.

I am using the following class to handle the DB connection and perform the batch writes:

public class DynamoDBService {

  private final AmazonDynamoDB amazonDynamoDB = new AmazonDynamoDBClient();
  private final DynamoDBMapper mapper = new DynamoDBMapper(amazonDynamoDB);

  @Value("${aws_region}")
  private String region;

  @PostConstruct
  public void init() {
    log.info("Region: {}", region);
    amazonDynamoDB.setRegion(RegionUtils.getRegion(region));
  }

  /**
   * 
   * @param records
   */
  public void saveRecord(final Collection<Record> records) {
    log.info("Saving records...");

    // create table if necessary here

    List<Record> recordsToSave = new ArrayList<Record>(100);

    for (Record record : records) {

      recordsToSave.add(record);

    }

    // save the records
    List<FailedBatch> failedBatch = mapper.batchWrite(recordsToSave, new ArrayList<Record>());
    // process failed writes here

    log.info("All records have been saved.");
  }
}

The problem is that the writes are painfully slow. I read the documentation and increased the throughput capacity (so it should now support over 300000 writes/hour) but it takes over 15 minutes to process one List containing approx. 8000 records.

I read that the optimal number of writes in one batch operation is 25 and size of one record below 1kb. I tested it both on my local machine (which I know will be slower because of the traffic overhead) and in the AWS worker environment but the results were both quite slow. Is there any way in which this process can be optimized?

First, so that you do not have multiple instances of DynamoDBMapper/client in multiple threads, make both the Mapper and the AmazonDynamoDB client static. Second, you should self throttle using Guava RateLimiter or similar. Set the rate equal to the number of writes-per-second you provisioned on your table, and acquire 25 permits before each batchWrite call as long as your items are less than 1KB. Third, you can run mapper.batchWrite calls in parallel. 300000 writes-per-hour are around 83 writes per second. That means your table probably has 1 partition, as long as the amount of data stored in your table is less than 10GB (I am assuming this is true). Fourth, you can reduce the dynamodb.timeout in client configuration. This may be helpful as a BatchWrite operation is as latent as the most latent individual PutRequest in the Batch. You can also try reducing or turning off SDK retries.

Note that the maximum number of writes per second supported on a partition is 1000. It is possible that you up-provisioned so much in the past that you caused your table to split for IOPS. If you have a Hash+Range schema and you write many items to the same hash key but different range keys, all those writes are going to the same partition. So, even though the sum of all the write capacity on your table might be 83 writes per second, it is possible you have a situation where you have many partitions and the partition level write provisioning is not enough to support your load.

In this case, two approaches are possible. You can start partitioning your hash keys, and use key1, key2, key3 etc as hash keys of the same logical "key", and use a hash and modulo division on the range key of your items to decide which hash key partition the item should be written to. The second, and preferable option, is to evaluate your schema to ensure that your writes are evenly distributed among the hash-range key space.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM