Kafka HLC 应该如何计算主题的分区数量？

Question

I'm using the kafka-node HighLevelConsumer, and am having problems where I always receive duplicate messages on startup.我正在使用kafka 节点HighLevelConsumer，并且在启动时总是收到重复消息的问题。

In order to maintain processing sequence, my consumer simply appends messages to a work queue, and I process the events serially.为了保持处理顺序，我的消费者只是将消息附加到工作队列，然后我按顺序处理事件。 I pause the consumer if I hit a queue high-water mark, I have auto-commit disabled, and I commit "manually" after my client code fully processes each event.如果我达到队列高水位线，我会暂停消费者，禁用自动提交，并在我的客户端代码完全处理每个事件后“手动”提交。

Despite committing, on startup, I always get the last (previously committed) message from one or more partitions (depending on how many other HLCs are running in my group).尽管提交，在启动时，我总是从一个或多个分区（取决于我的组中正在运行的其他 HLC 的数量）获得最后一条（先前提交的）消息。 I was a little surprised that the HLC wouldn't give me (committed+1) but I decided to just "ignore" messages that had an offset earlier than the offset committed.我有点惊讶 HLC 不会给我 (committed+1)，但我决定“忽略”偏移量早于提交的偏移量的消息。 As a quick test,作为快速测试，

offset.fetchCommits('fnord', [{topic:'test', partition: 0}, 
                              {topic:'test', partition: 1}, 
                              {topic:'test', partition: 2}, 
                              {topic:'test', partition: 3}], ...

This works if my payload list matches the number of partitions defined.如果我的有效负载列表与定义的分区数匹配，则此方法有效。 If I exceed the number of partitions, I get a [BrokerNotAvailableError: Could not find the leader] error.如果超过分区数， [BrokerNotAvailableError: Could not find the leader]出现[BrokerNotAvailableError: Could not find the leader]错误。

Am I correct that I can't auto-commit if I want to have a stronger guarantee that I won't lose messages if my message processing is asynchronous and may fail (ie ETL job)?如果我想更有力地保证我不会丢失消息，如果我的消息处理是异步的并且可能会失败（即 ETL 作业），我是否正确？ kafka-node just emits a 'message' event, there's no way to confirm that it was successfully handled. kafka-node 只是发出一个 'message' 事件，无法确认它是否已成功处理。
Is it expected behavior that the HighLevelConsumer will read the message of the last committed offset (ie a duplicate) rather than the next offset? HighLevelConsumer 将读取最后提交的偏移量（即重复）而不是下一个偏移量的消息是预期的行为吗？
What is the best way to get the number of partitions for a topic?获取主题分区数的最佳方法是什么？

Answer 1

I dug into the kafka-node source, and there's an undocumented call I was able to use to get the partition info:我挖掘了 kafka-node 源，并且有一个未记录的调用我可以用来获取分区信息：

client.loadMetadataForTopics(['test'], function(err, results) {..}

(I don't love calling something that doesn't appear to be a documented part of the public API, and I'm uncomfortable with the rather raw-feeling mixed array nature of the returned results, but it solves my problem for the moment.) （我不喜欢调用似乎不是公共 API 的文档部分的东西，而且我对返回结果的相当原始的混合数组性质感到不舒服，但它暂时解决了我的问题.)

Kafka HLC 应该如何计算主题的分区数量？

问题描述

1 个解决方案

解决方案1
2 2015-06-12 22:36:57

Kafka HLC 应该如何计算主题的分区数量？

问题描述

1 个解决方案

解决方案1 2 2015-06-12 22:36:57

解决方案1
2 2015-06-12 22:36:57