简体   繁体   中英

How to fetch large data in DynamoDB?

I need to review all items in specific table in DynamoDB.

My table contains 10 millions items. I tried to fetch all and I cannot insert them into list because it too large. My purpose is to go over all items and see if I can delete them.

Here is the sample scan table code. I am not sure whether you have this code.

The Scan API doesn't give you all the records in one go. You have to execute the scan recursively until LastEvaluatedKey is not null to get all the items in the table. You can imagine this is similar to the paginated output. This way you don't need to handle all the items (ie 10 million items) in one scan. Also, it is not going to cost (ie read capacity units) you as well.

If the total number of scanned items exceeds the maximum data set size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation. The results also include the number of items exceeding the limit. A scan can result in no table data meeting the filter criteria.

Scan API

public class ScanTable {

    public static void main(String[] args) {

        AmazonDynamoDB amazonDynamoDB = AmazonDynamoDBClientBuilder.standard()
                .withEndpointConfiguration(new EndpointConfiguration("http://localhost:8000", "us-east-1")).build();

        ScanRequest scanRequest = new ScanRequest().withTableName("Movies");

        Map<String, AttributeValue> lastKey = null;

        do {

            ScanResult scanResult = amazonDynamoDB.scan(scanRequest);

            List<Map<String, AttributeValue>> results = scanResult.getItems();

            // You can get the results here
            results.stream().forEach(System.out::println);

            lastKey = scanResult.getLastEvaluatedKey();
            scanRequest.setExclusiveStartKey(lastKey);
        } while (lastKey != null);

    }
}

Not clear :-

I understand that you wanted to retrieve all the items and do some processing. However, I am not sure why you would like to insert into list.

If you process the each scan results separately (ie 1MB of data), you may not need to insert into list and use the heap memory. Obviously, it requires more memory irrespective of the approach.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM