简体   繁体   English

所有分区中的 Kafka 轨道偏移量

[英]Kafka track offsets in all partitions

Will try to explain what I am trying to achieve.将尝试解释我想要实现的目标。

All I know is topic name and by that I must scale down to partitions.我只知道主题名称,因此我必须缩小到分区。 First I try首先我尝试

consumer.Subscribe(topics) 

And

consumer.Assignement

But if there is no delay between calls It returns empty list.但是如果调用之间没有延迟,它会返回空列表。

I could use consumer.Assign(..) But I dont know exact partitions, offsets yet.我可以使用consumer.Assign(..)但我还不知道确切的分区、偏移量。

Next when Iam able to go down to partition I need to get offsets low/high by time range.接下来,当我能够进入分区时,我需要按时间范围获得低/高偏移。

For example topi "test" has 5 partitions, and I need to extract all messages info (partition, offsets) for messages being inserted from 10:00 to 10:05.例如,主题“测试”有 5 个分区,我需要提取从 10:00 到 10:05 插入的消息的所有消息信息(分区、偏移量)。

If any additional info needed, just let me know.如果需要任何其他信息,请告诉我。

Thanks谢谢

I'm not 100% clear on what you are aiming for but some information about assignment may help here.我不是 100% 清楚你的目标是什么,但一些关于分配的信息可能会有所帮助。

The assignment() method returns an empty list prior to the first time the poll method is called on a consumer when joining the group, or after a rebalance - this is because when partitions are automatically assigned the consumer only finds out the assignment as one of the steps of the poll method, prior to fetching actual records. assignment() 方法在第一次在消费者加入组时或重新平衡之后调用 poll 方法之前返回一个空列表 - 这是因为当分区被自动分配时,消费者只会发现分配是其中之一在获取实际记录之前,轮询方法的步骤。

You can find out the actual assigned partitions either by calling poll at least once before calling assignment() - I think that is what you have discovered - else by passing a ConsumerRebalanceListener when calling subscribe().您可以通过在调用 assignment() 之前至少调用一次 poll 来找出实际分配的分区 - 我认为这就是您所发现的 - 或者通过在调用 subscribe() 时传递 ConsumerRebalanceListener 来找到。 The onPartitionsAssigned method is called during the poll - essentially a callback - with an argument that is the collection of newly assigned partitions.在轮询期间调用 onPartitionsAssigned 方法 - 本质上是一个回调 - 带有一个参数,即新分配的分区的集合。 This enables your code to discover the current assignment before any records are fetched.这使您的代码能够在获取任何记录之前发现当前分配。

Hope this helps a bit - I have written up a blog post about this aspect of assignment but haven't yet published it - I'll add a link when I do, if it sounds like this is the issue you are facing.希望这会有所帮助 - 我已经写了一篇关于作业这方面的博客文章,但尚未发布 - 我会在添加链接时添加一个链接,如果这听起来像是您面临的问题。

I went for a bit different approach.我采用了一些不同的方法。

  1. From IAdminClient load metada to get all available partitions for it.从 IAdminClient 加载元数据以获取它的所有可用分区。
  2. Create TopicPartitionTimestamp with start timestamp I need to consume from.使用我需要从中消费的开始时间戳创建 TopicPartitionTimestamp。
  3. Assign to TopicPartitionTimestamp and consume from it.分配给 TopicPartitionTimestamp 并从中消费。

Also I chose to start every partition consumption on different Thread.另外我选择在不同的线程上启动每个分区消耗。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM