[英]Apache-beam Bigquery .fromQuery ClassCastException
I'm trying to execute a query against a BigQuery
table, extract one column and populate to a file.我正在尝试对
BigQuery
表执行查询,提取一列并填充到文件中。 The code below throws an exception.下面的代码抛出异常。 I could be wrong but it seems the process is trying write temporary results to temp location as avro format, read the data from it and throws cast exception.
我可能是错的,但似乎该过程正在尝试将临时结果以 avro 格式写入临时位置,从中读取数据并抛出强制转换异常。
pipeLine.apply(
BigQueryIO.read(
(SchemaAndRecord elem) -> {
GenericRecord record = elem.getRecord();
return (String) record.get("column");
})
.fromQuery("SELECT column FROM `project.dataset.table`")
.usingStandardSql()
.withCoder(AvroCoder.of(String.class)))
.apply(TextIO.write().to("gs://bucket/test/result/data")
.withSuffix(TXT_EXT)
.withCompression(Compression.GZIP));
Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.lang.String at xxxxx.xxx.xxx.sampling.dataflow.samplingextractor.service.BigQueryExportService.lambda$export$43268ee4$1(BigQueryExportService.java:137) at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:242) at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:235) at org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:597) at org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:209) at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:484) at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:479) at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:249) at org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(Worker
引起:java.lang.ClassCastException:org.apache.avro.util.Utf8 无法转换为 java.lang.String 在 xxxxx.xxx.xxx.sampling.dataflow.samplingextractor.service.BigQueryExportService.lambda$export$43268ee4$1 (BigQueryExportService.java:137) 在 org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:242) 在 org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1 .apply(BigQuerySourceBase.java:235) 在 org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:597) 在 org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource .java:209) 在 org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:484) 在 org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:479) ) 在 org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:249) 在 org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(Worker CustomSources.java:601)
CustomSources.java:601)
我认为它建议您使用.withCoder(AvroCoder.of(org.apache.avro.util.Utf8.class)))
作为 String 不能直接从 Avro Utf8 类转换。
From looking at the documentation here , it seems you want to simply use the StringUtf8Coder class.通过查看此处的文档,您似乎只想使用StringUtf8Coder类。
pipeLine.apply(
BigQueryIO.read(
(SchemaAndRecord elem) -> {
GenericRecord record = elem.getRecord();
return (String) record.get("column");
})
.fromQuery("SELECT column FROM `project.dataset.table`")
.usingStandardSql()
.withCoder(StringUtf8Coder.of()))
.apply(TextIO.write().to("gs://bucket/test/result/data")
.withSuffix(TXT_EXT)
.withCompression(Compression.GZIP));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.