简体   繁体   English

Kafka HLC 应该如何计算主题的分区数量?

[英]How should a Kafka HLC figure out the # of partitions for a topic?

I'm using the kafka-node HighLevelConsumer, and am having problems where I always receive duplicate messages on startup.我正在使用kafka 节点HighLevelConsumer,并且在启动时总是收到重复消息的问题。

In order to maintain processing sequence, my consumer simply appends messages to a work queue, and I process the events serially.为了保持处理顺序,我的消费者只是将消息附加到工作队列,然后我按顺序处理事件。 I pause the consumer if I hit a queue high-water mark, I have auto-commit disabled, and I commit "manually" after my client code fully processes each event.如果我达到队列高水位线,我会暂停消费者,禁用自动提交,并在我的客户端代码完全处理每个事件后“手动”提交。

Despite committing, on startup, I always get the last (previously committed) message from one or more partitions (depending on how many other HLCs are running in my group).尽管提交,在启动时,我总是从一个或多个分区(取决于我的组中正在运行的其他 HLC 的数量)获得最后一条(先前提交的)消息。 I was a little surprised that the HLC wouldn't give me (committed+1) but I decided to just "ignore" messages that had an offset earlier than the offset committed.我有点惊讶 HLC 不会给我 (committed+1),但我决定“忽略”偏移量早于提交的偏移量的消息。 As a quick test,作为快速测试,

offset.fetchCommits('fnord', [{topic:'test', partition: 0}, 
                              {topic:'test', partition: 1}, 
                              {topic:'test', partition: 2}, 
                              {topic:'test', partition: 3}], ...

This works if my payload list matches the number of partitions defined.如果我的有效负载列表与定义的分区数匹配,则此方法有效。 If I exceed the number of partitions, I get a [BrokerNotAvailableError: Could not find the leader] error.如果超过分区数, [BrokerNotAvailableError: Could not find the leader]出现[BrokerNotAvailableError: Could not find the leader]错误。

  1. Am I correct that I can't auto-commit if I want to have a stronger guarantee that I won't lose messages if my message processing is asynchronous and may fail (ie ETL job)?如果我想更有力地保证我不会丢失消息,如果我的消息处理是异步的并且可能会失败(即 ETL 作业),我是否正确? kafka-node just emits a 'message' event, there's no way to confirm that it was successfully handled. kafka-node 只是发出一个 'message' 事件,无法确认它是否已成功处理。
  2. Is it expected behavior that the HighLevelConsumer will read the message of the last committed offset (ie a duplicate) rather than the next offset? HighLevelConsumer 将读取最后提交的偏移量(即重复)而不是下一个偏移量的消息是预期的行为吗?
  3. What is the best way to get the number of partitions for a topic?获取主题分区数的最佳方法是什么?

I dug into the kafka-node source, and there's an undocumented call I was able to use to get the partition info:我挖掘了 kafka-node 源,并且有一个未记录的调用我可以用来获取分区信息:

client.loadMetadataForTopics(['test'], function(err, results) {..}

(I don't love calling something that doesn't appear to be a documented part of the public API, and I'm uncomfortable with the rather raw-feeling mixed array nature of the returned results, but it solves my problem for the moment.) (我不喜欢调用似乎不是公共 API 的文档部分的东西,而且我对返回结果的相当原始的混合数组性质感到不舒服,但它暂时解决了我的问题.)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM