简体   繁体   English

如何提高 BigQuery 读取性能

[英]How to improve BigQuery read performance

We're using BigQuery to retrieve the full content of a big table.我们正在使用 BigQuery 来检索大表的全部内容。 We're using the publicly available publicdata:samples.natality.我们正在使用公开的 publicdata:samples.natality。

Our code follows Google instructions as described in their API doc - java .我们的代码遵循 Google 的说明,如他们的API doc-java 中所述

We're able to retrieve this table at around 1'300 rows/sec that is amazingly slow.我们能够以大约1,300 行/秒的速度检索此表,速度非常慢。 Is there a faster way to retrieve the full result of a query or is this always as fast as it gets ?有没有更快的方法来检索查询的完整结果,还是总是尽可能

The recommended way to retrieve a large amount of data from a BigQuery table is not to use tabledata.list to page through a full table as that example is using.从 BigQuery 表中检索大量数据的推荐方法是不要像该示例使用的那样使用tabledata.list分页浏览完整表。 That example is optimized for reading a small number of rows for the results of a query.该示例针对读取少量行以获取查询结果进行了优化。

Instead, you should run an extract job that exports the entire content of the table to Google Cloud Storage, which you can then download the full content from.相反,您应该运行一个提取作业,将表的整个内容导出到 Google Cloud Storage,然后您可以从中下载完整内容。

https://cloud.google.com/bigquery/exporting-data-from-bigquery https://cloud.google.com/bigquery/exporting-data-from-bigquery

To download a table fast you can use Google BigQuery Storage Client for Java.要快速下载表,您可以使用 Google BigQuery Storage Client for Java。

It lets you download the tables into efficient binaries format such as Avro or Arrow.它允许您将表格下载为高效的二进制格式,例如 Avro 或 Arrow。 Using the basic Arrow example in the documentation I manage to download ~1 million rows per second.使用文档中的基本 Arrow 示例,我设法每秒下载约 100 万行。

I think you can use it to download a query result by writing the result into a temporary table.我认为您可以通过将结果写入临时表来使用它来下载查询结果。

The code to get the temporary table of the result looks like this:获取结果临时表的代码如下所示:

public static TableId getTemporaryTable(String query) throws InterruptedException{
    QueryJobConfiguration queryConfig = 
                QueryJobConfiguration.newBuilder(query)
                    .setUseLegacySql(false)
                    .build();
    Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).build());
    queryJob = queryJob.waitFor(); // Wait for the query to complete.
    return ((QueryJobConfiguration) queryJob.getConfiguration()).getDestinationTable();
}

References:参考:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM