使用 ValueProvider 从 Dataflow 模板读取 BigQuery 时出现异常

Question

I'm trying to create a template to read from BigQuery, unfortunately I get an exception trying to build the template.我正在尝试创建一个模板以从 BigQuery 读取，不幸的是我在尝试构建模板时遇到异常。

An exception occured while executing the Java class.执行 Java 类时发生异常。 Cannot call validate if table is dynamically set.如果表是动态设置的，则无法调用验证。

Reading the documentation , it seems that there's a special function to call when reading BigQuery from batch template :阅读文档，似乎在从批处理模板读取 BigQuery 时有一个特殊的函数要调用：

Note : If you want to run a batch pipeline that reads from BigQuery, you must use .withTemplateCompatibility() on all BigQuery reads.注意：如果要运行从 BigQuery 读取的批处理管道，则必须对所有 BigQuery 读取使用.withTemplateCompatibility() 。

So, here's my code snippet :所以，这是我的代码片段：

PCollection<Discount> discountFromBigQuery = p.apply("Parse Discounts from BigQuery", BigQueryIO.read((SerializableFunction<SchemaAndRecord, Discount>) record -> {
        GenericRecord row = record.getRecord();
        return new Discount(row);
    }).withTemplateCompatibility().from(options.getBigQueryDiscountPath()).withCoder(SerializableCoder.of(Discount.class)));

Obviously, options.getBigQueryDiscountPath() is a ValueProvider<String>显然， options.getBigQueryDiscountPath()是一个ValueProvider<String>

So, how can I get rid of this error and template the BigQuery reading part ?那么，我怎样才能摆脱这个错误并为 BigQuery 阅读部分模板化？

Here are the maven dependencies I use :这是我使用的 Maven 依赖项：

<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-core</artifactId>
    <version>2.8.0</version>
</dependency>
<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
    <version>2.8.0</version>
</dependency>
<dependency>
    <groupId>com.google.cloud.dataflow</groupId>
    <artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
    <version>2.5.0</version>
</dependency>

Answer 1

I believe the error you are facing is defined here .我相信您面临的错误在这里定义。 Please note the explanation, that mentions请注意其中提到的解释

Note that a table or query check can fail if the table or dataset are created by earlier stages of the pipeline or if a query depends on earlier stages of a pipeline.请注意，如果表或数据集是由管道的早期阶段创建的，或者查询依赖于管道的早期阶段，则表或查询检查可能会失败。

To overcome this, try adding the withoutValidation method in your BigQueryIO.read call.要解决此问题，请尝试在 BigQueryIO.read 调用中添加withoutValidation 方法。

Answer 2

By the way, withoutValidation() needs to be added at the end of the chain like below.顺便说一句， withoutValidation() 需要添加到链的末尾，如下所示。

    // queryString is of type ValueProvider<String>
    PCollection<TableRow> rowsFromBigQuery = pipeline.apply(
                BigQueryIO.readTableRows()
                        .fromQuery(queryString)
                        .usingStandardSql()
                        .withMethod(options.getReadMethod())
                        .withoutValidation());

使用 ValueProvider 从 Dataflow 模板读取 BigQuery 时出现异常

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-11-28 20:52:15

解决方案2
1 2020-02-07 03:09:00

使用 ValueProvider 从 Dataflow 模板读取 BigQuery 时出现异常

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-11-28 20:52:15

解决方案2 1 2020-02-07 03:09:00

解决方案1
2 已采纳 2018-11-28 20:52:15

解决方案2
1 2020-02-07 03:09:00