[英]Exactly-once processing in Kafka with replay
I am using Kafka for my event log/processing.我正在使用 Kafka 进行事件日志/处理。 I am looking for (as close to) exactly-once processing as I can get whilst supporting "replays" during partition (re)assignment, notifying the event handler of a replay so it can rebuild it state我正在寻找(尽可能接近)一次处理,同时在分区(重新)分配期间支持“重播”,通知重播的事件处理程序以便它可以重建它 state
Here is my code:这是我的代码:
private final KafkaConsumer<String, String> consumer;
private final KafkaProducer<String, String> producer;
private final BiFunction<String, Boolean, String> eventHandler;
private final long[] startingCommitOffsets;
public void onParitionsAssigned(Collection<TopicPartition> partitions) {
partitions.forEach(p -> startingCommitOffsets[p.partition()] = consumer.position(p));
consumer.seekToBeginning(partitions);
}
public void run() {
while (true) {
var records = consumer.poll(Duration.ofMillis(Long.MAX_VALUE));
var commitRecords = new HashMap<TopicPartition, OffsetAndMetadata>();
producer.beginTransation();
records.forEach(r -> {
var isReplay = r.offset() < startingCommitOffsets[r.partition()];
var resultEvent = eventHandler.apply(r.value(), isReplay);
producer.send(new ProducerRecord<>(r.topic(), r.key(), resultEvent));
if (!isReplay) {
commitRecords.put(new TopicPartition(r.topic(), r.partition(), new OffsetAndMetadata(r.offset()));
}
});
producer.commitTransaction();
if (!commitRecords.isEmpty()) {
consumer.commitSync(commitRecords);
}
}
}
My questions:我的问题:
product.commitTransaction()
and consumer.commitSync()
are two separate operations. product.commitTransaction()
和consumer.commitSync()
是两个独立的操作。 If the later fails, we would have already committed some new events which will be duplicated next time the events are processed - is there any way to combine these into one operations?如果后者失败,我们将已经提交了一些新事件,这些新事件将在下次处理事件时重复 - 有没有办法将它们组合成一个操作?When the partition is assigned, I save the current position and seek to the beginning.分配分区后,我保存当前的 position 并寻找到开头。 This doesn't changed the committed position does it?这不会改变已提交的 position 吗?
Committed position doesn't change until you explicitly call commitAsync() or commitSync()
or auto.commit.enable=true
提交的 position 不会更改,直到您明确调用commitAsync() or commitSync()
或auto.commit.enable=true
producer.commitTransaction()
andconsumer.commitSync()
are two separate operations.producer.commitTransaction()
和consumer.commitSync()
是两个独立的操作。 If the later fails, we would have already committed some new events which will be duplicated next time the events are processed Is there any way to combine these into one operations?如果后者失败,我们将已经提交了一些新事件,这些新事件将在下次处理事件时复制。有没有办法将它们组合成一个操作?
producer.sendOffsetsToTransaction()
This method might be the one your are looking for to achieve exactly once processing.此方法可能是您正在寻找的方法,以实现仅一次处理。
From the documentation :从文档中:
Sends a list of specified offsets to the consumer group coordinator, and also marks those offsets as part of the current transaction.将指定偏移量列表发送给消费者组协调器,并将这些偏移量标记为当前事务的一部分。 These offsets will be considered committed only if the transaction is committed successfully.仅当事务成功提交时,这些偏移量才会被视为已提交。 The committed offset should be the next message your application will consume, ie
lastProcessedMessageOffset+1
.提交的偏移量应该是您的应用程序将使用的下一条消息,即lastProcessedMessageOffset+1
。
More importantly,更重要的是,
Note, that the consumer should have
enable.auto.commit=false
and should also not commit offsets manually (via sync or async commits).请注意,消费者应该具有enable.auto.commit=false
并且也不应该手动提交偏移量(通过同步或异步提交)。
You can deduce the TopicPartition
and offset from ConsumerRecord
which you will get as a result of poll()
.您可以通过poll()
从ConsumerRecord
中推断出TopicPartition
和偏移量。
Just store them ( new TopicPartition(record.topic(), record.partition())
and new OffsetAndMetadata(record.offset())
) in a map and pass it when you would want to commit.只需将它们( new TopicPartition(record.topic(), record.partition())
和new OffsetAndMetadata(record.offset())
)存储在 map 中,并在您想要提交时传递它。
Following code snippet can get you an idea ( reference ):下面的代码片段可以让你有个想法(参考):
KafkaProducer producer = createKafkaProducer(
“bootstrap.servers”, “localhost:9092”,
“transactional.id”, “my-transactional-id”);
producer.initTransactions();
KafkaConsumer consumer = createKafkaConsumer(
“bootstrap.servers”, “localhost:9092”,
“group.id”, “my-group-id”,
"isolation.level", "read_committed");
consumer.subscribe(singleton(“inputTopic”));
while (true) {
ConsumerRecords records = consumer.poll(Long.MAX_VALUE);
producer.beginTransaction();
Map<TopicPartition, OffsetAndMetadata> map = new LinkedHashMap<>();
for (ConsumerRecord record : records) {
producer.send(producerRecord(“outputTopic”, record));
map.put(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset()));
}
producer.sendOffsetsToTransaction(offsetMap, group);
producer.commitTransaction();
}
After sending the offsets, we commit them.发送偏移量后,我们提交它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.