简体   繁体   English

Apache Beam Pipeline KafkaIO - 手动提交偏移量

[英]Apache Beam Pipeline KafkaIO - Commit offeset manully

I have a beam pipeline to consume streaming events with multiple stages(PTransforms) to process them.我有一个光束管道来消耗具有多个阶段(PTransforms)的流事件来处理它们。 See the following code,看下面的代码,

pipeline.apply("Read Data from Stream", StreamReader.read())
        .apply("Decode event and extract relevant fields", ParDo.of(new DecodeExtractFields()))
        .apply("Deduplicate process", ParDo.of(new Deduplication()))
        .apply("Conversion, Mapping and Persisting", ParDo.of(new DataTransformer()))
        .apply("Build Kafka Message", ParDo.of(new PrepareMessage()))
        .apply("Publish", ParDo.of(new PublishMessage()))
        .apply("Commit offset", ParDo.of(new CommitOffset()));

The streaming events read by using the KafkaIO and the StreamReader.read() method implementation is like this,使用KafkaIO和StreamReader.read()方法实现读取的流事件是这样的,

public static KafkaIO.Read<String, String> read() {
    return KafkaIO.<String, String>read()
            .withBootstrapServers(Constants.BOOTSTRAP_SERVER)
            .withTopics(Constants.KAFKA_TOPICS)
            .withConsumerConfigUpdates(Constants.CONSUMER_PROPERTIES)
            .withKeyDeserializer(StringDeserializer.class)
            .withValueDeserializer(StringDeserializer.class);
}

After we read a streamed event/message through the KafkaIO, we can commit the offset.通过 KafkaIO 读取流式事件/消息后,我们可以提交偏移量。 What i need to do is commit the offset manually , inside the last Commit offset PTransform when all the previous PTransforms executed.我需要做的是手动提交偏移量,当所有先前的 PTransforms 执行时,在最后一个Commit offset PTransform 内。

The reason is, I am doing some conversions, mappings and persisting in the middle of the pipeline and when all the things done without failing, i need to commit the offset.原因是,我在管道中间进行了一些转换、映射和持久化,当所有事情都没有失败时,我需要提交偏移量。 By doing so, if the processing fails in the middle, i can consume same record/event again and process.通过这样做,如果处理在中间失败,我可以再次使用相同的记录/事件并进行处理。

My question is, how do i commit the offset manually?我的问题是,我如何手动提交偏移量? Appreciate if its possible to share resources/sample codes.欣赏是否可以共享资源/示例代码。

Well, for sure, there are Read.commitOffsetsInFinalize() method, that is supposed to commit offsets while finalising the checkpoints, and AUTO_COMMIT consumer config option, that is used to auto-commit read records by Kafka consumer.好吧,可以肯定的是,有Read.commitOffsetsInFinalize()方法,它应该在最终确定检查点时提交偏移量,以及AUTO_COMMIT消费者配置选项,它用于自动提交 Kafka 消费者的读取记录。

Though, in your case, it won't work and you need to do it manually by grouping the offsets of the same topic/partitiona/window and creating a new instance of Kafka client in your CommitOffset DoFn which will commit these offsets.但是,在您的情况下,它不起作用,您需要通过将同一主题/分区/窗口的偏移量分组并在您的CommitOffset DoFn 中创建一个新的 Kafka 客户端实例来手动完成,该实例将提交这些偏移量。 You need to group the offsets by partition, otherwise it may be a race condition with committing the offsets of the same partition on different workers.您需要按分区对偏移量进行分组,否则可能会出现竞争条件,将同一分区的偏移量提交给不同的工作人员。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM