How should a Kafka HLC figure out the # of partitions for a topic?

Question

I'm using the kafka-node HighLevelConsumer, and am having problems where I always receive duplicate messages on startup.

In order to maintain processing sequence, my consumer simply appends messages to a work queue, and I process the events serially. I pause the consumer if I hit a queue high-water mark, I have auto-commit disabled, and I commit "manually" after my client code fully processes each event.

Despite committing, on startup, I always get the last (previously committed) message from one or more partitions (depending on how many other HLCs are running in my group). I was a little surprised that the HLC wouldn't give me (committed+1) but I decided to just "ignore" messages that had an offset earlier than the offset committed. As a quick test,

offset.fetchCommits('fnord', [{topic:'test', partition: 0}, 
                              {topic:'test', partition: 1}, 
                              {topic:'test', partition: 2}, 
                              {topic:'test', partition: 3}], ...

This works if my payload list matches the number of partitions defined. If I exceed the number of partitions, I get a [BrokerNotAvailableError: Could not find the leader] error.

Am I correct that I can't auto-commit if I want to have a stronger guarantee that I won't lose messages if my message processing is asynchronous and may fail (ie ETL job)? kafka-node just emits a 'message' event, there's no way to confirm that it was successfully handled.
Is it expected behavior that the HighLevelConsumer will read the message of the last committed offset (ie a duplicate) rather than the next offset?
What is the best way to get the number of partitions for a topic?

Answer 1

I dug into the kafka-node source, and there's an undocumented call I was able to use to get the partition info:

client.loadMetadataForTopics(['test'], function(err, results) {..}

(I don't love calling something that doesn't appear to be a documented part of the public API, and I'm uncomfortable with the rather raw-feeling mixed array nature of the returned results, but it solves my problem for the moment.)

How should a Kafka HLC figure out the # of partitions for a topic?

Question

1 answers

solution1
2 2015-06-12 22:36:57

How should a Kafka HLC figure out the # of partitions for a topic?

Question

1 answers

solution1 2 2015-06-12 22:36:57

solution1
2 2015-06-12 22:36:57