簡體   English   中英

如何使用 KafkaIO 和 Apache 使用 Java 設置 AvroCoder

[英]How to set AvroCoder with KafkaIO and Apache Beam with Java

我正在嘗試創建一個管道,將數據從 Kafka 主題流式傳輸到谷歌的 Bigquery。 主題中的數據在 Avro 中。

我打電話申請 function 3 次。 一次從 Kafka 讀取,一次提取記錄,一次寫入 Bigquery。 這是代碼的主要部分:

        pipeline
            .apply("Read from Kafka",
                    KafkaIO
                            .<byte[], GenericRecord>read()
                            .withBootstrapServers(options.getKafkaBrokers().get())
                            .withTopics(Utils.getListFromString(options.getKafkaTopics()))
                            .withKeyDeserializer(
                                    ConfluentSchemaRegistryDeserializerProvider.of(
                                            options.getSchemaRegistryUrl().get(),
                                            options.getSubject().get())
                            )
                            .withValueDeserializer(
                                    ConfluentSchemaRegistryDeserializerProvider.of(
                                            options.getSchemaRegistryUrl().get(),
                                            options.getSubject().get()))
                            .withoutMetadata()
            )

            .apply("Extract GenericRecord",
                    MapElements.into(TypeDescriptor.of(GenericRecord.class)).via(KV::getValue)
            )
            .apply(
                    "Write data to BQ",
                    BigQueryIO
                            .<GenericRecord>write()
                            .optimizedWrites()
                            .useBeamSchema()
                            .useAvroLogicalTypes()
                            .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
                            .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
                            .withSchemaUpdateOptions(ImmutableSet.of(BigQueryIO.Write.SchemaUpdateOption.ALLOW_FIELD_ADDITION))
                            //Temporary location to save files in GCS before loading to BQ
                            .withCustomGcsTempLocation(options.getGcsTempLocation())
                            .withNumFileShards(options.getNumShards().get())
                            .withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
                            .withMethod(FILE_LOADS)
                            .withTriggeringFrequency(Utils.parseDuration(options.getWindowDuration().get()))
                            .to(new TableReference()
                                    .setProjectId(options.getGcpProjectId().get())
                                    .setDatasetId(options.getGcpDatasetId().get())
                                    .setTableId(options.getGcpTableId().get()))

            );

運行時,我收到以下錯誤:

    Exception in thread "main" java.lang.IllegalStateException: Unable to return a default Coder for Extract GenericRecord/Map/ParMultiDo(Anonymous).output [PCollection]. Correct one of the following root causes:  No Coder has been manually specified;  you may do so using .setCoder().
  Inferring a Coder from the CoderRegistry failed: Unable to provide a Coder for org.apache.avro.generic.GenericRecord.
  Building a Coder using a registered CoderProvider failed.

如何設置編碼器以正確讀取 Avro?

至少有以下三種方法:

  1. 設置編碼器內聯:
     pipeline.apply("Read from Kafka", ....)  
    .apply("Dropping key", Values.create())
    .setCoder(AvroCoder.of(Schema schemaOfGenericRecord))
    .apply("Write data to BQ", ....);

請注意,該鍵已被刪除,因為它未使用,因此您不再需要 MapElements。

  1. 在管道的 CoderRegistry 實例中注冊編碼器:
pipeline.getCoderRegistry().registerCoderForClass(GenericRecord.class, AvroCoder.of(Schema genericSchema));
  1. 通過以下方式從模式注冊表中獲取編碼器:
ConfluentSchemaRegistryDeserializerProvider.getCoder(CoderRegistry registry)

https://beam.apache.org/releases/javadoc/2.22.0/org/apache/beam/sdk/io/kafka/ConfluentSchemaRegistryDeserializerProvider.html#getCoder-org.apache.beam.sdk.coders.CoderRegistry-

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM