简体   繁体   中英

How to fetch parallel pagination data from big query

I am fetching paginated data from bq since data is huge it takes a lot of time to process them.

while (results.hasNextPage()) {
            results = results.getNextPage();
            count += results.getValues().spliterator().getExactSizeIfKnown();
            results
                    .getValues()
                    .forEach(row ->
                                {
                                    //Some operations.
                                }
                    );
            logger.info("Grouping completed in iteration {}. Progress: {} / {}", i, count, results.getTotalRows());
            i++;
        }

I examine my program with visualVm and I realize that majority of the time is spent on results.getNextPage line which is getting next page data. Is there any way to make it parallel? I mean fetching every batch of data(which is 20K in my case) in different thread. I am using java client com.google.cloud.bigquery

Each query writes to a destination table. If no destination table is provided, the BigQuery API automatically populates the destination table property with a reference to a temporary anonymous table.

Having that table you can use the tabledata.list API call to get the data from it. Under the optional params, you will see a startIndex parameter that you can set to whatever you want, and you can use in your pagination script.

You can run parallel API calls using different offsets that will speed your request.

You can refer to this document to Page through results using the API.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM