简体   繁体   中英

Exactly-once processing in Kafka with replay

I am using Kafka for my event log/processing. I am looking for (as close to) exactly-once processing as I can get whilst supporting "replays" during partition (re)assignment, notifying the event handler of a replay so it can rebuild it state

Here is my code:

private final KafkaConsumer<String, String> consumer;
private final KafkaProducer<String, String> producer;
private final BiFunction<String, Boolean, String> eventHandler;
private final long[] startingCommitOffsets;

public void onParitionsAssigned(Collection<TopicPartition> partitions) {
  partitions.forEach(p -> startingCommitOffsets[p.partition()] = consumer.position(p));
  consumer.seekToBeginning(partitions);
}

public void run() {
  while (true) {
    var records = consumer.poll(Duration.ofMillis(Long.MAX_VALUE));
    var commitRecords = new HashMap<TopicPartition, OffsetAndMetadata>();

    producer.beginTransation();

    records.forEach(r -> {
      var isReplay = r.offset() < startingCommitOffsets[r.partition()];
      var resultEvent = eventHandler.apply(r.value(), isReplay);
      producer.send(new ProducerRecord<>(r.topic(), r.key(), resultEvent));

      if (!isReplay) {
        commitRecords.put(new TopicPartition(r.topic(), r.partition(), new OffsetAndMetadata(r.offset()));
      }
    });

    producer.commitTransaction();

    if (!commitRecords.isEmpty()) {
      consumer.commitSync(commitRecords);
    }
  }
}

My questions:

  1. When the partition is assigned, I save the current position and seek to the beginning. This doesn't changed the committed position does it? (The docs weren't clear)
  2. product.commitTransaction() and consumer.commitSync() are two separate operations. If the later fails, we would have already committed some new events which will be duplicated next time the events are processed - is there any way to combine these into one operations?

When the partition is assigned, I save the current position and seek to the beginning. This doesn't changed the committed position does it?

Committed position doesn't change until you explicitly call commitAsync() or commitSync() or auto.commit.enable=true

producer.commitTransaction() and consumer.commitSync() are two separate operations. If the later fails, we would have already committed some new events which will be duplicated next time the events are processed Is there any way to combine these into one operations?

producer.sendOffsetsToTransaction()

This method might be the one your are looking for to achieve exactly once processing.

From the documentation :

Sends a list of specified offsets to the consumer group coordinator, and also marks those offsets as part of the current transaction. These offsets will be considered committed only if the transaction is committed successfully. The committed offset should be the next message your application will consume, ie lastProcessedMessageOffset+1 .

More importantly,

Note, that the consumer should have enable.auto.commit=false and should also not commit offsets manually (via sync or async commits).


You can deduce the TopicPartition and offset from ConsumerRecord which you will get as a result of poll() .

Just store them ( new TopicPartition(record.topic(), record.partition()) and new OffsetAndMetadata(record.offset()) ) in a map and pass it when you would want to commit.

Following code snippet can get you an idea ( reference ):

KafkaProducer producer = createKafkaProducer(
  “bootstrap.servers”, “localhost:9092”,
  “transactional.id”, “my-transactional-id”);

producer.initTransactions();

KafkaConsumer consumer = createKafkaConsumer(
  “bootstrap.servers”, “localhost:9092”,
  “group.id”, “my-group-id”,
  "isolation.level", "read_committed");

consumer.subscribe(singleton(“inputTopic”));
    while (true) {
      ConsumerRecords records = consumer.poll(Long.MAX_VALUE);
      producer.beginTransaction();
      Map<TopicPartition, OffsetAndMetadata> map = new LinkedHashMap<>();
      for (ConsumerRecord record : records) {
        producer.send(producerRecord(“outputTopic”, record));
        map.put(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset()));
      }
      producer.sendOffsetsToTransaction(offsetMap, group);  
      producer.commitTransaction();
    }

After sending the offsets, we commit them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM