简体   繁体   English

在 Kafka 中通过重放进行精确一次处理

[英]Exactly-once processing in Kafka with replay

I am using Kafka for my event log/processing.我正在使用 Kafka 进行事件日志/处理。 I am looking for (as close to) exactly-once processing as I can get whilst supporting "replays" during partition (re)assignment, notifying the event handler of a replay so it can rebuild it state我正在寻找(尽可能接近)一次处理,同时在分区(重新)分配期间支持“重播”,通知重播的事件处理程序以便它可以重建它 state

Here is my code:这是我的代码:

private final KafkaConsumer<String, String> consumer;
private final KafkaProducer<String, String> producer;
private final BiFunction<String, Boolean, String> eventHandler;
private final long[] startingCommitOffsets;

public void onParitionsAssigned(Collection<TopicPartition> partitions) {
  partitions.forEach(p -> startingCommitOffsets[p.partition()] = consumer.position(p));
  consumer.seekToBeginning(partitions);
}

public void run() {
  while (true) {
    var records = consumer.poll(Duration.ofMillis(Long.MAX_VALUE));
    var commitRecords = new HashMap<TopicPartition, OffsetAndMetadata>();

    producer.beginTransation();

    records.forEach(r -> {
      var isReplay = r.offset() < startingCommitOffsets[r.partition()];
      var resultEvent = eventHandler.apply(r.value(), isReplay);
      producer.send(new ProducerRecord<>(r.topic(), r.key(), resultEvent));

      if (!isReplay) {
        commitRecords.put(new TopicPartition(r.topic(), r.partition(), new OffsetAndMetadata(r.offset()));
      }
    });

    producer.commitTransaction();

    if (!commitRecords.isEmpty()) {
      consumer.commitSync(commitRecords);
    }
  }
}

My questions:我的问题:

  1. When the partition is assigned, I save the current position and seek to the beginning.分配分区后,我保存当前的 position 并寻找到开头。 This doesn't changed the committed position does it?这不会改变已提交的 position 吗? (The docs weren't clear) (文档不清楚)
  2. product.commitTransaction() and consumer.commitSync() are two separate operations. product.commitTransaction()consumer.commitSync()是两个独立的操作。 If the later fails, we would have already committed some new events which will be duplicated next time the events are processed - is there any way to combine these into one operations?如果后者失败,我们将已经提交了一些新事件,这些新事件将在下次处理事件时重复 - 有没有办法将它们组合成一个操作?

When the partition is assigned, I save the current position and seek to the beginning.分配分区后,我保存当前的 position 并寻找到开头。 This doesn't changed the committed position does it?这不会改变已提交的 position 吗?

Committed position doesn't change until you explicitly call commitAsync() or commitSync() or auto.commit.enable=true提交的 position 不会更改,直到您明确调用commitAsync() or commitSync()auto.commit.enable=true

producer.commitTransaction() and consumer.commitSync() are two separate operations. producer.commitTransaction()consumer.commitSync()是两个独立的操作。 If the later fails, we would have already committed some new events which will be duplicated next time the events are processed Is there any way to combine these into one operations?如果后者失败,我们将已经提交了一些新事件,这些新事件将在下次处理事件时复制。有没有办法将它们组合成一个操作?

producer.sendOffsetsToTransaction()

This method might be the one your are looking for to achieve exactly once processing.此方法可能是您正在寻找的方法,以实现仅一次处理。

From the documentation :文档中:

Sends a list of specified offsets to the consumer group coordinator, and also marks those offsets as part of the current transaction.将指定偏移量列表发送给消费者组协调器,并将这些偏移量标记为当前事务的一部分。 These offsets will be considered committed only if the transaction is committed successfully.仅当事务成功提交时,这些偏移量才会被视为已提交。 The committed offset should be the next message your application will consume, ie lastProcessedMessageOffset+1 .提交的偏移量应该是您的应用程序将使用的下一条消息,即lastProcessedMessageOffset+1

More importantly,更重要的是,

Note, that the consumer should have enable.auto.commit=false and should also not commit offsets manually (via sync or async commits).请注意,消费者应该具有enable.auto.commit=false并且也不应该手动提交偏移量(通过同步或异步提交)。


You can deduce the TopicPartition and offset from ConsumerRecord which you will get as a result of poll() .您可以通过poll()ConsumerRecord中推断出TopicPartition和偏移量。

Just store them ( new TopicPartition(record.topic(), record.partition()) and new OffsetAndMetadata(record.offset()) ) in a map and pass it when you would want to commit.只需将它们( new TopicPartition(record.topic(), record.partition())new OffsetAndMetadata(record.offset()) )存储在 map 中,并在您想要提交时传递它。

Following code snippet can get you an idea ( reference ):下面的代码片段可以让你有个想法(参考):

KafkaProducer producer = createKafkaProducer(
  “bootstrap.servers”, “localhost:9092”,
  “transactional.id”, “my-transactional-id”);

producer.initTransactions();

KafkaConsumer consumer = createKafkaConsumer(
  “bootstrap.servers”, “localhost:9092”,
  “group.id”, “my-group-id”,
  "isolation.level", "read_committed");

consumer.subscribe(singleton(“inputTopic”));
    while (true) {
      ConsumerRecords records = consumer.poll(Long.MAX_VALUE);
      producer.beginTransaction();
      Map<TopicPartition, OffsetAndMetadata> map = new LinkedHashMap<>();
      for (ConsumerRecord record : records) {
        producer.send(producerRecord(“outputTopic”, record));
        map.put(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset()));
      }
      producer.sendOffsetsToTransaction(offsetMap, group);  
      producer.commitTransaction();
    }

After sending the offsets, we commit them.发送偏移量后,我们提交它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka的Spring Boot只需一次处理 - Spring boot with Kafka with Exactly Once Processing 我们可以使用 Spring kafka 进行一次性处理吗? - Can we have exactly-once-processing with Spring kafka? 当 PROCESSING_GUARANTEE_CONFIG 设置为 EXACTLY_ONCE 时,Kafka 无法重新平衡 - Kafka Failed to rebalance when PROCESSING_GUARANTEE_CONFIG set to EXACTLY_ONCE Kafka Streams 的 processing.guarantee 设置为 EXACTLY_ONCE 问题 - Kafka Streams with processing.guarantee set up to EXACTLY_ONCE issue 切换到“exactly-once”投放策略后如何避免“failed to send operations”错误? - How to avoid "failed to send operations" errors after switching to the "exactly-once" delivery strategy? 我可以使用spring.cloud.stream.bindings吗? <channel> .group何时使用RabbitMQ获得一次准确的交货? - Can I use spring.cloud.stream.bindings.<channel>.group when using RabbitMQ to obtain exactly-once delivery? Exactly once 生产者和消费 - Apache Kafka 和 SpringBoot - Exactly once Producer and Consumption - Apache Kafka and SpringBoot 如何使用 Apache Kafka 实现“Exactly once” kafka 消费者? - How to implement “Exactly once” kafka consumer using Apache Kafka? 重播Kafka中的消息 - Replay messages in Kafka Flink Kafka EXACTLY_ONCE 导致 KafkaException ByteArraySerializer 不是 Serializer 的实例 - Flink Kafka EXACTLY_ONCE causing KafkaException ByteArraySerializer is not an instance of Serializer
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM