简体   繁体   English

Apache-beam Bigquery .fromQuery ClassCastException

[英]Apache-beam Bigquery .fromQuery ClassCastException

I'm trying to execute a query against a BigQuery table, extract one column and populate to a file.我正在尝试对BigQuery表执行查询,提取一列并填充到文件中。 The code below throws an exception.下面的代码抛出异常。 I could be wrong but it seems the process is trying write temporary results to temp location as avro format, read the data from it and throws cast exception.我可能是错的,但似乎该过程正在尝试将临时结果以 avro 格式写入临时位置,从中读取数据并抛出强制转换异常。

pipeLine.apply(
        BigQueryIO.read(
                (SchemaAndRecord elem) -> {
                  GenericRecord record = elem.getRecord();
                  return (String) record.get("column");
                })
                .fromQuery("SELECT column FROM `project.dataset.table`")
                .usingStandardSql()
                .withCoder(AvroCoder.of(String.class)))
        .apply(TextIO.write().to("gs://bucket/test/result/data")
                .withSuffix(TXT_EXT)
                .withCompression(Compression.GZIP));

Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.String at xxxxx.xxx.xxx.sampling.dataflow.samplingextractor.service.BigQueryExportService.lambda$export$43268ee4$1(BigQueryExportService.java:137) at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:242) at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:235) at org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:597) at org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:209) at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:484) at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:479) at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:249) at org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(Worker引起:java.lang.ClassCastException:org.apache.avro.util.Utf8 无法转换为 java.lang.String 在 xxxxx.xxx.xxx.sampling.dataflow.samplingextractor.service.BigQueryExportService.lambda$export$43268ee4$1 (BigQueryExportService.java:137) 在 org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:242) 在 org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1 .apply(BigQuerySourceBase.java:235) 在 org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:597) 在 org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource .java:209) 在 org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:484) 在 org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:479) ) 在 org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:249) 在 org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(Worker CustomSources.java:601) CustomSources.java:601)

我认为它建议您使用.withCoder(AvroCoder.of(org.apache.avro.util.Utf8.class)))作为 String 不能直接从 Avro Utf8 类转换。

From looking at the documentation here , it seems you want to simply use the StringUtf8Coder class.通过查看此处文档,您似乎只想使用StringUtf8Coder类。

pipeLine.apply(
    BigQueryIO.read(
            (SchemaAndRecord elem) -> {
              GenericRecord record = elem.getRecord();
              return (String) record.get("column");
            })
            .fromQuery("SELECT column FROM `project.dataset.table`")
            .usingStandardSql()
            .withCoder(StringUtf8Coder.of()))
        .apply(TextIO.write().to("gs://bucket/test/result/data")
            .withSuffix(TXT_EXT)
            .withCompression(Compression.GZIP));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM