卡夫卡消费者没有从分区中收到一条消息

Question

I just noticed that when I produce a single message into a partition, my consumer is not receiving it. 我只是注意到，当我在分区中生成单个消息时，我的使用者没有收到它。 Only after I produce a few more messages into the same partition, the consumer receives them. 只有当我在同一个分区中再产生几条消息后，消费者才收到它们。 My fetch.min.bytes is set to 1. 我的fetch.min.bytes设置为1。

Is there some other config that could affect here? 还有其他可能会影响此处的配置吗？

I have a dedicated consumer for each partition. 每个分区都有一个专用的使用者。

Consumer code for the relevant part. 相关部分的消费者代码。 My consumer starts several threads for different topics that are defined by the configs['stream'] . 我的使用者为configs['stream']定义的不同主题启动了多个线程。 Uses https://github.com/mmustala/rdkafka-ruby which is a fork from original consumer gem. 使用https://github.com/mmustala/rdkafka-ruby ，这是原始消费类宝石的叉子。 I added a batch consuming method. 我添加了一个批处理方法。 And a method to shut down the consumer in a managed way. 以及一种以托管方式关闭消费者的方法。

key = configs['app_key']
consumer = Rdkafka::Config.new(config(configs)).consumer
topic = "#{topic_prefix}#{app_env}_#{configs['stream']}"
consumer.subscribe(topic)

logger.info "#{rand}| Starting consumer for #{key} with topic #{topic}"
begin
  retry_counter = 0
  retries_started_at = nil
  current_assignment = nil
  partitions = []
  consumer.each_batch(configs['max_messages_per_partition'] || 5, 100, rand) do |messages|
    partitions = messages.collect {|m| m.partition}.uniq.sort
    logger.info "#{rand}| Batch started. Received #{messages.length} messages from partitions #{partitions} for app #{key}"
    current_assignment = consumer.assignment.to_h
    values = messages.collect {|m| JSON.parse(m.payload)}
    skip_commit = false
    begin
      values.each_slice((values.length / ((retry_counter * 2) + 1).to_f).ceil) do |slice|
        logger.info "#{rand}| Sending #{slice.length} messages to lambda"
        result = invoke_lambda(key, slice)
        if result.status_code != 200 || result.function_error
          logger.info "#{rand}| Batch finished with error #{result.function_error}"
          raise LambdaError, result.function_error.to_s
        end
      end
    rescue LambdaError => e
      logger.warn "#{rand}| #{e}"
      if consumer.running? && current_assignment == consumer.assignment.to_h
        retry_counter += 1
        retries_started_at ||= Time.now
        if retry_counter <= 5 && Time.now - retries_started_at < 600
          logger.warn "#{rand}| Retrying from: #{e.cause}, app_key: #{key}"
          Rollbar.warning("Retrying from: #{e.cause}", app_key: key, thread: rand, partitions: partitions.join(', '))
          sleep 5
          retry if consumer.running? && current_assignment == consumer.assignment.to_h
        else
          raise e # Raise to exit the retry loop so that consumers are rebalanced.
        end
      end
      skip_commit = true
    end
    retry_counter = 0
    retries_started_at = nil
    if skip_commit
      logger.info "#{rand}| Commit skipped"
    else
      consumer.commit
      logger.info "#{rand}| Batch finished"
    end
  end
  consumer.close
  logger.info "#{rand}| Stopped #{key}"
rescue Rdkafka::RdkafkaError => e
  logger.warn "#{rand}| #{e}"
  logger.info "#{rand}| assignment: #{consumer.assignment.to_h}"
  if e.to_s.index('No offset stored')
    retry
  else
    raise e
  end
end

config 配置

def config(app_config)
  {
      "bootstrap.servers": brokers,
      "group.id": app_configs['app_key'],
      "enable.auto.commit": false,
      "enable.partition.eof": false,
      "log.connection.close": false,
      "session.timeout.ms": 30*1000,
      "fetch.message.max.bytes": ['sources'].include?(app_configs['stream']) ? 102400 : 10240,
      "queued.max.messages.kbytes": ['sources'].include?(app_configs['stream']) ? 250 : 25,
      "queued.min.messages": (app_configs['max_messages_per_partition'] || 5) * 10,
      "fetch.min.bytes": 1,
      "partition.assignment.strategy": 'roundrobin'
  }
end

Producer code uses https://github.com/zendesk/ruby-kafka 生产者代码使用https://github.com/zendesk/ruby-kafka

def to_kafka(stream_name, data, batch_size)
  stream_name_with_env = "#{Rails.env}_#{stream_name}"
  topic = [Rails.application.secrets.kafka_topic_prefix, stream_name_with_env].compact.join
  partitions_count = KAFKA.partitions_for(topic)
  Rails.logger.info "Partition count for #{topic}: #{partitions_count}"
  if @job.active? && @job.partition.blank?
    @job.connect_to_partition
  end
  partition = @job.partition&.number.to_i % partitions_count
  producer = KAFKA.producer 
  if data.is_a?(Array)
    data.each_slice(batch_size) do |slice|
      producer.produce(JSON.generate(slice), topic: topic, partition: partition)
    end
  else
    producer.produce(JSON.generate(data), topic: topic, partition: partition)
  end
  producer.deliver_messages
  Rails.logger.info "records sent to topic #{topic} partition #{partition}"
  producer.shutdown
end

UPDATE: It looks like the number of messages is irrelevant. 更新：看起来消息数量无关紧要。 I just produced over 100 messages into one partition and the consumer has not yet started to consume those. 我刚刚将100多个消息生成到一个分区中，而使用者尚未开始使用这些消息。

UPDATE2: It didn't start consuming the messages during the night. UPDATE2：晚上没有开始使用这些消息。 But when I produced a new set of messages into the same partition this morning, it woke up and started to consume the new messages I just produced. 但是今天早上当我在同一分区中产生一组新消息时，它醒来并开始使用我刚产生的新消息。 It skipped over the messages produced last night. 它跳过了昨晚发出的消息。

Answer 1

I believe the issue was that the partition had not received messages for a while and apparently it did not have an offset saved. 我认为问题在于该分区已经有一段时间没有收到消息了，并且显然没有保存偏移量。 When the offset was acquired it was set to the largest value which is the default. 获取偏移量后，将其设置为默认值的最大值。 After I set auto.offset.reset: 'smallest' I have not seen such an issue where messages would have been skipped. 在设置auto.offset.reset: 'smallest'我还没有看到会跳过邮件的问题。

卡夫卡消费者没有从分区中收到一条消息

问题描述

1 个解决方案

解决方案1
0 2018-06-28 11:21:31

卡夫卡消费者没有从分区中收到一条消息

问题描述

1 个解决方案

解决方案1 0 2018-06-28 11:21:31

解决方案1
0 2018-06-28 11:21:31