Apache-beam Bigquery .fromQuery ClassCastException

Question

I'm trying to execute a query against a BigQuery table, extract one column and populate to a file.我正在尝试对BigQuery表执行查询，提取一列并填充到文件中。 The code below throws an exception.下面的代码抛出异常。 I could be wrong but it seems the process is trying write temporary results to temp location as avro format, read the data from it and throws cast exception.我可能是错的，但似乎该过程正在尝试将临时结果以 avro 格式写入临时位置，从中读取数据并抛出强制转换异常。

pipeLine.apply(
        BigQueryIO.read(
                (SchemaAndRecord elem) -> {
                  GenericRecord record = elem.getRecord();
                  return (String) record.get("column");
                })
                .fromQuery("SELECT column FROM `project.dataset.table`")
                .usingStandardSql()
                .withCoder(AvroCoder.of(String.class)))
        .apply(TextIO.write().to("gs://bucket/test/result/data")
                .withSuffix(TXT_EXT)
                .withCompression(Compression.GZIP));

Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.String at xxxxx.xxx.xxx.sampling.dataflow.samplingextractor.service.BigQueryExportService.lambda$export$43268ee4$1(BigQueryExportService.java:137) at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:242) at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:235) at org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:597) at org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:209) at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:484) at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:479) at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:249) at org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(Worker引起：java.lang.ClassCastException：org.apache.avro.util.Utf8 无法转换为 java.lang.String 在 xxxxx.xxx.xxx.sampling.dataflow.samplingextractor.service.BigQueryExportService.lambda$export$43268ee4$1 (BigQueryExportService.java:137) 在 org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:242) 在 org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1 .apply(BigQuerySourceBase.java:235) 在 org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:597) 在 org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource .java:209) 在 org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:484) 在 org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:479) ) 在 org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:249) 在 org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(Worker CustomSources.java:601) CustomSources.java:601)

Answer 1

我认为它建议您使用.withCoder(AvroCoder.of(org.apache.avro.util.Utf8.class)))作为 String 不能直接从 Avro Utf8 类转换。

Answer 2

From looking at the documentation here , it seems you want to simply use the StringUtf8Coder class.通过查看此处的文档，您似乎只想使用StringUtf8Coder类。

pipeLine.apply(
    BigQueryIO.read(
            (SchemaAndRecord elem) -> {
              GenericRecord record = elem.getRecord();
              return (String) record.get("column");
            })
            .fromQuery("SELECT column FROM `project.dataset.table`")
            .usingStandardSql()
            .withCoder(StringUtf8Coder.of()))
        .apply(TextIO.write().to("gs://bucket/test/result/data")
            .withSuffix(TXT_EXT)
            .withCompression(Compression.GZIP));

Apache-beam Bigquery .fromQuery ClassCastException

问题描述

2 个解决方案

解决方案1
0 2020-02-25 23:09:54

解决方案2
0 2020-02-28 06:29:05

Apache-beam Bigquery .fromQuery ClassCastException

问题描述

2 个解决方案

解决方案1 0 2020-02-25 23:09:54

解决方案2 0 2020-02-28 06:29:05

解决方案1
0 2020-02-25 23:09:54

解决方案2
0 2020-02-28 06:29:05