I am fetching paginated data from bq since data is huge it takes a lot of time to process them.
while (results.hasNextPage()) {
results = results.getNextPage();
count += results.getValues().spliterator().getExactSizeIfKnown();
results
.getValues()
.forEach(row ->
{
//Some operations.
}
);
logger.info("Grouping completed in iteration {}. Progress: {} / {}", i, count, results.getTotalRows());
i++;
}
I examine my program with visualVm and I realize that majority of the time is spent on results.getNextPage
line which is getting next page data. Is there any way to make it parallel? I mean fetching every batch of data(which is 20K in my case) in different thread. I am using java client com.google.cloud.bigquery
Each query writes to a destination table. If no destination table is provided, the BigQuery API automatically populates the destination table property with a reference to a temporary anonymous table.
Having that table you can use the tabledata.list
API call to get the data from it. Under the optional params, you will see a startIndex
parameter that you can set to whatever you want, and you can use in your pagination script.
You can run parallel API calls using different offsets that will speed your request.
You can refer to this document to Page through results using the API.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.