简体   繁体   English

使用 ValueProvider 从 Dataflow 模板读取 BigQuery 时出现异常

[英]Exception when reading BigQuery from Dataflow template using ValueProvider

I'm trying to create a template to read from BigQuery, unfortunately I get an exception trying to build the template.我正在尝试创建一个模板以从 BigQuery 读取,不幸的是我在尝试构建模板时遇到异常。

An exception occured while executing the Java class.执行 Java 类时发生异常。 Cannot call validate if table is dynamically set.如果表是动态设置的,则无法调用验证。

Reading the documentation , it seems that there's a special function to call when reading BigQuery from batch template :阅读文档,似乎在从批处理模板读取 BigQuery 时有一个特殊的函数要调用:

Note : If you want to run a batch pipeline that reads from BigQuery, you must use .withTemplateCompatibility() on all BigQuery reads.注意:如果要运行从 BigQuery 读取的批处理管道,则必须对所有 BigQuery 读取使用.withTemplateCompatibility()

So, here's my code snippet :所以,这是我的代码片段:

PCollection<Discount> discountFromBigQuery = p.apply("Parse Discounts from BigQuery", BigQueryIO.read((SerializableFunction<SchemaAndRecord, Discount>) record -> {
        GenericRecord row = record.getRecord();
        return new Discount(row);
    }).withTemplateCompatibility().from(options.getBigQueryDiscountPath()).withCoder(SerializableCoder.of(Discount.class)));

Obviously, options.getBigQueryDiscountPath() is a ValueProvider<String>显然, options.getBigQueryDiscountPath()是一个ValueProvider<String>

So, how can I get rid of this error and template the BigQuery reading part ?那么,我怎样才能摆脱这个错误并为 BigQuery 阅读部分模板化?

Here are the maven dependencies I use :这是我使用的 Maven 依赖项:

<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-core</artifactId>
    <version>2.8.0</version>
</dependency>
<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
    <version>2.8.0</version>
</dependency>
<dependency>
    <groupId>com.google.cloud.dataflow</groupId>
    <artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
    <version>2.5.0</version>
</dependency>

I believe the error you are facing is defined here .我相信您面临的错误在这里定义。 Please note the explanation, that mentions请注意其中提到的解释

Note that a table or query check can fail if the table or dataset are created by earlier stages of the pipeline or if a query depends on earlier stages of a pipeline.请注意,如果表或数据集是由管道的早期阶段创建的,或者查询依赖于管道的早期阶段,则表或查询检查可能会失败。

To overcome this, try adding the withoutValidation method in your BigQueryIO.read call.要解决此问题,请尝试在 BigQueryIO.read 调用中添加withoutValidation 方法

By the way, withoutValidation() needs to be added at the end of the chain like below.顺便说一句, withoutValidation() 需要添加到链的末尾,如下所示。

    // queryString is of type ValueProvider<String>
    PCollection<TableRow> rowsFromBigQuery = pipeline.apply(
                BigQueryIO.readTableRows()
                        .fromQuery(queryString)
                        .usingStandardSql()
                        .withMethod(options.getReadMethod())
                        .withoutValidation());

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Google 的 Dataflow 时尝试使用 JDBC 到 BigQuery 模板时出现异常 - Exception when trying to use JDBC to BigQuery template when using Google's Dataflow Dataflow 上的 Apache Beam 不接受 BigQuery 查询的 ValueProvider - Apache Beam on Dataflow Not Accepting ValueProvider for BigQuery Query 如何使用Dataflow的jdbc to Bigquery模板将数据从Oracle 11g第2版传输到Bigquery? - How to transfer data from Oracle 11g release 2 to Bigquery using Dataflow's jdbc to Bigquery template? 多次使用暂存模板进行部署时,Dataflow 作业使用相同的 BigQuery 作业 ID? - Dataflow job uses same BigQuery job ID when deploying using a staged template multiple times? 使用DataFlow从多个PubSub主题流到BigQuery时,邮件卡在GBP中吗? - Messages stuck in GBP when streaming from multiple PubSub topics to BigQuery using DataFlow? 使用 Clud Dataflow 将数据从 Google Cloud Sql 读取到 BigQuery - Read the data from Google Cloud Sql to BigQuery using Clud Dataflow GCP Dataflow-如何使用Dataflow从Google BigQuery读取数据并将其加载到Google Spanner - GCP Dataflow - How to read the data from Google BigQuery and load into Google Spanner using Dataflow Dataprep-输出为BigQuery时,数据流失败 - Dataprep - Dataflow fails when output is BigQuery PUB/SUB 到 Bigquery 而不使用 DataFlow - PUB/SUB To Bigquery Without Using DataFlow java.lang.IllegalStateException: getTransportChannel() 在数据流中使用 bigquery 客户端库时在 needsExecutor() 为真时调用 - java.lang.IllegalStateException: getTransportChannel() called when needsExecutor() is true while using bigquery client library in dataflow
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM