简体   繁体   English

带有和不带分页的DynamoDB Table.scan

[英]DynamoDB Table.scan with and without pagination

I am trying to understand the difference between two following code segments. 我试图理解以下两个代码段之间的区别。 One uses pages to get scan results, and the second one doesn't. 一个使用页面来获取扫描结果,而第二个不使用。 I am wondering whether the second approach would work if the total number of items in the database is very large? 我想知道如果数据库中的项目总数非常大,第二种方法是否有效? AWS docs say that scan result is limited by 1 Mb. AWS文档称扫描结果受限于1 Mb。 How does this affect version 2? 这对第2版有何影响? Will it only get first 1 mb of results or would it still make database calls after each page? 它只会获得前1 MB的结果,还是会在每个页面后进行数据库调用?

Note that I am using table.scan API, which is different from DynamoDBClient.scan api. 请注意,我使用的是table.scan API,它与DynamoDBClient.scan api不同。 See http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/document/Table.html for API details. 有关API的详细信息,请参阅http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/document/Table.html

Version 1 (using pages): 版本1(使用页面):

            ItemCollection<ScanOutcome> items = table.scan(spec);
            items.pages().forEach(page -> {
                for (Item item : page) {
                    response.add(item);
                }
            });

Version 2 (iterating over items without pages): 版本2(迭代没有页面的项目):

            ItemCollection<ScanOutcome> items = table.scan(spec);
            for (Item item : items) {
                    response.add(item);
            }

Tofig is correct. Tofig是对的。 There is no difference in between those two methods. 这两种方法没有区别。 The statement about the Scan result being limited to 1 MB is only true for the low-level API not for the Document API. 关于扫描结果限制为1 MB的声明仅适用于不适用于Document API的低级API。
From the documentation of ItemCollection ItemCollection的文档

A collection of Item's. 物品的集合。 An ItemCollection object maintains a cursor pointing to its current pages of data. ItemCollection对象维护指向其当前数据页的光标。 Initially the cursor is positioned before the first page. 最初,光标位于第一页之前。 The next method moves the cursor to the next row, and because it returns false when there are no more rows in the ItemCollection object, it can be used in a while loop to iterate through the collection. 下一个方法将光标移动到下一行,因为当ItemCollection对象中没有更多行时它返回false,它可以在while循环中用于迭代集合。 Network calls can be triggered when the collection is iterated across page boundaries . 当跨页边界迭代集合时,可以触发网络调用

I have conducted an experiment where I have created 1000 records with 5kb size each. 我进行了一项实验,我创建了1000条记录,每条记录的大小为5kb。 Then I've used version 2 to scan the table and still got all 1000 records, although total size is clearly > 1mb. 然后我使用版本2扫描表格,仍然获得了所有1000条记录,但总大小明显> 1mb。 Both versions scanned the whole table, so it seems there is no difference. 两个版本扫描整个表格,所以似乎没有区别。 It seems that ItemCollection handles pagination for you and there is no need to use pages, unless you want to control network calls and page size. 似乎ItemCollection为您处理分页,除非您想控制网络调用和页面大小,否则不需要使用页面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM