[英]How to set AvroCoder with KafkaIO and Apache Beam with Java
我正在尝试创建一个管道,将数据从 Kafka 主题流式传输到谷歌的 Bigquery。 主题中的数据在 Avro 中。
我打电话申请 function 3 次。 一次从 Kafka 读取,一次提取记录,一次写入 Bigquery。 这是代码的主要部分:
pipeline
.apply("Read from Kafka",
KafkaIO
.<byte[], GenericRecord>read()
.withBootstrapServers(options.getKafkaBrokers().get())
.withTopics(Utils.getListFromString(options.getKafkaTopics()))
.withKeyDeserializer(
ConfluentSchemaRegistryDeserializerProvider.of(
options.getSchemaRegistryUrl().get(),
options.getSubject().get())
)
.withValueDeserializer(
ConfluentSchemaRegistryDeserializerProvider.of(
options.getSchemaRegistryUrl().get(),
options.getSubject().get()))
.withoutMetadata()
)
.apply("Extract GenericRecord",
MapElements.into(TypeDescriptor.of(GenericRecord.class)).via(KV::getValue)
)
.apply(
"Write data to BQ",
BigQueryIO
.<GenericRecord>write()
.optimizedWrites()
.useBeamSchema()
.useAvroLogicalTypes()
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withSchemaUpdateOptions(ImmutableSet.of(BigQueryIO.Write.SchemaUpdateOption.ALLOW_FIELD_ADDITION))
//Temporary location to save files in GCS before loading to BQ
.withCustomGcsTempLocation(options.getGcsTempLocation())
.withNumFileShards(options.getNumShards().get())
.withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
.withMethod(FILE_LOADS)
.withTriggeringFrequency(Utils.parseDuration(options.getWindowDuration().get()))
.to(new TableReference()
.setProjectId(options.getGcpProjectId().get())
.setDatasetId(options.getGcpDatasetId().get())
.setTableId(options.getGcpTableId().get()))
);
运行时,我收到以下错误:
Exception in thread "main" java.lang.IllegalStateException: Unable to return a default Coder for Extract GenericRecord/Map/ParMultiDo(Anonymous).output [PCollection]. Correct one of the following root causes: No Coder has been manually specified; you may do so using .setCoder().
Inferring a Coder from the CoderRegistry failed: Unable to provide a Coder for org.apache.avro.generic.GenericRecord.
Building a Coder using a registered CoderProvider failed.
如何设置编码器以正确读取 Avro?
至少有以下三种方法:
pipeline.apply("Read from Kafka", ....)
.apply("Dropping key", Values.create())
.setCoder(AvroCoder.of(Schema schemaOfGenericRecord))
.apply("Write data to BQ", ....);
请注意,该键已被删除,因为它未使用,因此您不再需要 MapElements。
pipeline.getCoderRegistry().registerCoderForClass(GenericRecord.class, AvroCoder.of(Schema genericSchema));
ConfluentSchemaRegistryDeserializerProvider.getCoder(CoderRegistry registry)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.