简体   繁体   English

如何从大查询中获取并行分页数据

[英]How to fetch parallel pagination data from big query

I am fetching paginated data from bq since data is huge it takes a lot of time to process them.我正在从 bq 获取分页数据,因为数据量很大,需要花费大量时间来处理它们。

while (results.hasNextPage()) {
            results = results.getNextPage();
            count += results.getValues().spliterator().getExactSizeIfKnown();
            results
                    .getValues()
                    .forEach(row ->
                                {
                                    //Some operations.
                                }
                    );
            logger.info("Grouping completed in iteration {}. Progress: {} / {}", i, count, results.getTotalRows());
            i++;
        }

I examine my program with visualVm and I realize that majority of the time is spent on results.getNextPage line which is getting next page data.我用 visualVm 检查了我的程序,我意识到大部分时间都花在了results.getNextPage行上,该行正在获取下一页数据。 Is there any way to make it parallel?有没有办法让它平行? I mean fetching every batch of data(which is 20K in my case) in different thread.我的意思是在不同的线程中获取每批数据(在我的情况下是 20K)。 I am using java client com.google.cloud.bigquery我正在使用 java 客户端com.google.cloud.bigquery

Each query writes to a destination table.每个查询都写入一个目标表。 If no destination table is provided, the BigQuery API automatically populates the destination table property with a reference to a temporary anonymous table.如果未提供目标表,BigQuery API 会自动使用对临时匿名表的引用填充目标表属性。

Having that table you can use the tabledata.list API call to get the data from it.有了该表,您可以使用tabledata.list API 调用从中获取数据。 Under the optional params, you will see a startIndex parameter that you can set to whatever you want, and you can use in your pagination script.在可选参数下,您将看到一个startIndex参数,您可以将其设置为任何您想要的,并且可以在分页脚本中使用。

You can run parallel API calls using different offsets that will speed your request.您可以使用不同的偏移量运行并行 API 调用,这将加快您的请求。

You can refer to this document to Page through results using the API.您可以参考本文档以使用 API 翻页结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM