[英]Set maximumBillingTier when reading from BigQuery in Dataflow
I'm running GCP Dataflow job when I'm reading data from BigQuery as a query result. 从BigQuery读取数据作为查询结果时,我正在运行GCP Dataflow作业。 I'm using google-cloud-dataflow-java-sdk-all version 1.9.0. 我正在使用google-cloud-dataflow-java-sdk-all版本1.9.0。 The code fragment that sets up the pipeline looks like this: 设置管道的代码片段如下所示:
PCollection<TableRow> myRows = pipeline.apply(BigQueryIO.Read
.fromQuery(query)
.usingStandardSql()
.withoutResultFlattening()
.named("Input " + tableId)
);
The query is quite complex what results in error message: 查询非常复杂,导致产生错误消息:
Query exceeded resource limits for tier 1. Tier 8 or higher required., error: Query exceeded resource limits for tier 1. Tier 8 or higher required. 查询超出了对第1层的资源限制。,错误:查询超出了对第1层的资源限制。
I'd like to set maximumBillingTier
as it is done in Web UI or in bq script. 我想设置maximumBillingTier
因为它是在Web UI或bq脚本中完成的。 I can't find any way to do so except for setting default for the entire project which is unfortunately not an option. 除了为整个项目设置默认值之外,我找不到任何其他方法,不幸的是这不是一个选择。
I tried to set it through these without success: 我试图通过这些设置,但没有成功:
usingStandardSql
and others similar but obviously it is not there BigQueryIO.Read.Bound-我希望它就在usingStandardSql
和其他类似的代码旁边,但显然它不存在 Is there any way to pass this setting from within Dataflow job? 有什么方法可以从Dataflow作业中传递此设置吗?
Maybe a Googler will correct me, but it looks like you are right. 也许Google员工会纠正我,但看来您是对的。 I can't see this parameter exposed either. 我也看不到此参数。 I checked both the Dataflow and the Beam APIs. 我检查了数据流和Beam API。
Under the hood, Dataflow is using JobConfigurationQuery
from the BigQuery API, but it simply doesn't expose that parameter through its own API. 在JobConfigurationQuery
,Dataflow正在使用BigQuery API中的JobConfigurationQuery
,但它只是不通过其自己的API公开该参数。
One workaround I see is to first run your complex query using the BigQuery API directly - before dropping into your pipeline. 我看到的一种解决方法是先直接使用BigQuery API运行复杂的查询-然后再进入管道。 That way you can set the max billing tier through the JobConfigurationQuery
class. 这样,您可以通过JobConfigurationQuery
类设置最大计费层。 Write the results of that query to another table in BigQuery. 将该查询的结果写入BigQuery中的另一个表。
Then finally, in your pipeline, just read in the table which was created from the complex query. 最后,在您的管道中,只需读取通过复杂查询创建的表即可。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.