[英]Kafka consumer not getting a single message from partition
I just noticed that when I produce a single message into a partition, my consumer is not receiving it. 我只是注意到,当我在分区中生成单个消息时,我的使用者没有收到它。 Only after I produce a few more messages into the same partition, the consumer receives them.
只有当我在同一个分区中再产生几条消息后,消费者才收到它们。 My
fetch.min.bytes
is set to 1. 我的
fetch.min.bytes
设置为1。
Is there some other config that could affect here? 还有其他可能会影响此处的配置吗?
I have a dedicated consumer for each partition. 每个分区都有一个专用的使用者。
Consumer code for the relevant part. 相关部分的消费者代码。 My consumer starts several threads for different topics that are defined by the
configs['stream']
. 我的使用者为
configs['stream']
定义的不同主题启动了多个线程。 Uses https://github.com/mmustala/rdkafka-ruby which is a fork from original consumer gem. 使用https://github.com/mmustala/rdkafka-ruby ,这是原始消费类宝石的叉子。 I added a batch consuming method.
我添加了一个批处理方法。 And a method to shut down the consumer in a managed way.
以及一种以托管方式关闭消费者的方法。
key = configs['app_key']
consumer = Rdkafka::Config.new(config(configs)).consumer
topic = "#{topic_prefix}#{app_env}_#{configs['stream']}"
consumer.subscribe(topic)
logger.info "#{rand}| Starting consumer for #{key} with topic #{topic}"
begin
retry_counter = 0
retries_started_at = nil
current_assignment = nil
partitions = []
consumer.each_batch(configs['max_messages_per_partition'] || 5, 100, rand) do |messages|
partitions = messages.collect {|m| m.partition}.uniq.sort
logger.info "#{rand}| Batch started. Received #{messages.length} messages from partitions #{partitions} for app #{key}"
current_assignment = consumer.assignment.to_h
values = messages.collect {|m| JSON.parse(m.payload)}
skip_commit = false
begin
values.each_slice((values.length / ((retry_counter * 2) + 1).to_f).ceil) do |slice|
logger.info "#{rand}| Sending #{slice.length} messages to lambda"
result = invoke_lambda(key, slice)
if result.status_code != 200 || result.function_error
logger.info "#{rand}| Batch finished with error #{result.function_error}"
raise LambdaError, result.function_error.to_s
end
end
rescue LambdaError => e
logger.warn "#{rand}| #{e}"
if consumer.running? && current_assignment == consumer.assignment.to_h
retry_counter += 1
retries_started_at ||= Time.now
if retry_counter <= 5 && Time.now - retries_started_at < 600
logger.warn "#{rand}| Retrying from: #{e.cause}, app_key: #{key}"
Rollbar.warning("Retrying from: #{e.cause}", app_key: key, thread: rand, partitions: partitions.join(', '))
sleep 5
retry if consumer.running? && current_assignment == consumer.assignment.to_h
else
raise e # Raise to exit the retry loop so that consumers are rebalanced.
end
end
skip_commit = true
end
retry_counter = 0
retries_started_at = nil
if skip_commit
logger.info "#{rand}| Commit skipped"
else
consumer.commit
logger.info "#{rand}| Batch finished"
end
end
consumer.close
logger.info "#{rand}| Stopped #{key}"
rescue Rdkafka::RdkafkaError => e
logger.warn "#{rand}| #{e}"
logger.info "#{rand}| assignment: #{consumer.assignment.to_h}"
if e.to_s.index('No offset stored')
retry
else
raise e
end
end
config 配置
def config(app_config)
{
"bootstrap.servers": brokers,
"group.id": app_configs['app_key'],
"enable.auto.commit": false,
"enable.partition.eof": false,
"log.connection.close": false,
"session.timeout.ms": 30*1000,
"fetch.message.max.bytes": ['sources'].include?(app_configs['stream']) ? 102400 : 10240,
"queued.max.messages.kbytes": ['sources'].include?(app_configs['stream']) ? 250 : 25,
"queued.min.messages": (app_configs['max_messages_per_partition'] || 5) * 10,
"fetch.min.bytes": 1,
"partition.assignment.strategy": 'roundrobin'
}
end
Producer code uses https://github.com/zendesk/ruby-kafka 生产者代码使用https://github.com/zendesk/ruby-kafka
def to_kafka(stream_name, data, batch_size)
stream_name_with_env = "#{Rails.env}_#{stream_name}"
topic = [Rails.application.secrets.kafka_topic_prefix, stream_name_with_env].compact.join
partitions_count = KAFKA.partitions_for(topic)
Rails.logger.info "Partition count for #{topic}: #{partitions_count}"
if @job.active? && @job.partition.blank?
@job.connect_to_partition
end
partition = @job.partition&.number.to_i % partitions_count
producer = KAFKA.producer
if data.is_a?(Array)
data.each_slice(batch_size) do |slice|
producer.produce(JSON.generate(slice), topic: topic, partition: partition)
end
else
producer.produce(JSON.generate(data), topic: topic, partition: partition)
end
producer.deliver_messages
Rails.logger.info "records sent to topic #{topic} partition #{partition}"
producer.shutdown
end
UPDATE: It looks like the number of messages is irrelevant. 更新:看起来消息数量无关紧要。 I just produced over 100 messages into one partition and the consumer has not yet started to consume those.
我刚刚将100多个消息生成到一个分区中,而使用者尚未开始使用这些消息。
UPDATE2: It didn't start consuming the messages during the night. UPDATE2:晚上没有开始使用这些消息。 But when I produced a new set of messages into the same partition this morning, it woke up and started to consume the new messages I just produced.
但是今天早上当我在同一分区中产生一组新消息时,它醒来并开始使用我刚产生的新消息。 It skipped over the messages produced last night.
它跳过了昨晚发出的消息。
I believe the issue was that the partition had not received messages for a while and apparently it did not have an offset saved. 我认为问题在于该分区已经有一段时间没有收到消息了,并且显然没有保存偏移量。 When the offset was acquired it was set to the largest value which is the default.
获取偏移量后,将其设置为默认值的最大值。 After I set
auto.offset.reset: 'smallest'
I have not seen such an issue where messages would have been skipped. 在设置
auto.offset.reset: 'smallest'
我还没有看到会跳过邮件的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.