简体   繁体   English

重置为 Kafka 分区中的自定义偏移量

[英]Reset to custom offset in Kafka partition

I am researching Kafka for a specific use case I am working on.我正在为我正在研究的特定用例研究 Kafka。 I have a stream of data that is flowing and I want to process it and publish it to intermediary stages.我有一个正在流动的数据流,我想对其进行处理并将其发布到中间阶段。

At each of these stages (initial and intermediary) Samza tasks would do the processing and re publishing.在这些阶段(初始和中间)中的每一个阶段 Samza 任务都会进行处理和重新发布。 One of the requirements I have is for me to be able to re-trigger the whole processing pipeline from a specific stage in time whenever I want.我的要求之一是能够随时从特定阶段重新触发整个处理管道。

I know that kafka maintains an offset for each of its logs (incoming data) .我知道 kafka 为其每个日志(传入数据)维护一个偏移量。 However, does Kafka provide any functionality with which I can map partition offsets to some custom identifier (say timestamp) and use this to re-trigger the whole pipeline from that point on wards?但是,Kafka 是否提供任何功能,我可以分区偏移量映射到某个自定义标识符(例如时间戳),并使用它从病房的那个点重新触发整个管道?

I have read in multiple places that I can replay the kafka commit log by resetting it the beginning and also going back some N times.我在多个地方读到过,我可以通过重新设置开始并返回 N 次来重放 kafka 提交日志。 But is there a way for me to map these offsets to my own identifier like time stamps and use it as a mechanism to tell from which offset to replay.但是有没有办法让我将这些偏移量映射到我自己的标识符(如时间戳),并将其用作一种机制来判断从哪个偏移量重放。

Best最好的事物
Shabir沙比尔

you can use commandline tool kafka-consumer-groups to reset offset for consumer group based on timestamp (--to-datetime).您可以使用命令行工具 kafka-consumer-groups 根据时间戳(--to-datetime)重置消费者组的偏移量。 See more on the doc page: https://kafka.apache.org/documentation/#basic_ops_consumer_group在文档页面上查看更多信息: https : //kafka.apache.org/documentation/#basic_ops_consumer_group

The same, of course, can be achieved through the code.同样的,当然也可以通过代码来实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM