简体   繁体   English

卡夫卡领导人选举导致Kafka Streams崩溃

[英]Kafka leader election causes Kafka Streams crash

I have a Kafka Streams application consuming from and producing to a Kafka cluster with 3 brokers and a replication factor of 3. Other than the consumer offset topics (50 partitions), all other topics have only one partition each. 我有一个Kafka Streams应用程序消耗并生成具有3个代理的Kafka集群,复制因子为3.除了消费者偏移主题(50个分区)之外,所有其他主题每个只有一个分区。

When the brokers attempt a preferred replica election, the Streams app (which is running on a completely different instance than the brokers) fails with the error: 当代理尝试首选副本时,Streams应用程序(运行在与代理完全不同的实例上)失败并显示错误:

Caused by: org.apache.kafka.streams.errors.StreamsException: task [0_0] exception caught when producing
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:119)
    ...
    at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:197)
Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.

Is it normal that the Streams app attempts to be the leader for the partition, given that it's running on a server that's not part of the Kafka cluster? Streams应用程序尝试成为分区的领导者是正常的,因为它在不属于Kafka集群的服务器上运行?

I can reproduce this behaviour on demand by: 我可以通过以下方式重现此行为:

  1. Killing one of the brokers (whereupon the other 2 take over as leader for all partitions that had the killed broker as their leader, as expected) 杀死其中一个经纪人(其他2个接管者作为所有分区的领导者,按照预期将杀死的经纪人作为他们的领导者)
  2. Bringing the killed broker back up 让遇难的经纪人重新振作起来
  3. Triggering a preferred replica leader election with bin/kafka-preferred-replica-election.sh --zookeeper localhost 使用bin/kafka-preferred-replica-election.sh --zookeeper localhost触发首选副本领导者选举bin/kafka-preferred-replica-election.sh --zookeeper localhost

My issue seems to be similar to this reported failure , so I'm wondering if this is a new Kafka Streams bug. 我的问题似乎与报告的失败类似,所以我想知道这是否是一个新的Kafka Streams错误。 My full stack trace is literally exactly the same as the gist linked in the reported failure ( here ). 我的完整堆栈跟踪与报告的失败中链接的要点完全相同( 此处 )。

Another potentially interesting detail is that during the leader election, I get these messages in the controller.log of the broker: 另一个可能有趣的细节是,在领导者选举期间,我在经纪人的controller.log中获取这些消息:

[2017-04-12 11:07:50,940] WARN [Controller-3-to-broker-3-send-thread], Controller 3's connection to broker BROKER-3-HOSTNAME:9092 (id: 3 rack: null) was unsuccessful (kafka.controller.RequestSendThread)
java.io.IOException: Connection to BROKER-3-HOSTNAME:9092 (id: 3 rack: null) failed
    at kafka.utils.NetworkClientBlockingOps$.awaitReady$1(NetworkClientBlockingOps.scala:84)
    at kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:94)
    at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:232)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:185)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:184)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

I initially thought this connection error was to blame, but after the leader election crashes the Streams app, if I restart the Streams app, it works normally until the next election, without me touching the brokers at all. 我最初认为这个连接错误是罪魁祸首,但是在领导者选举崩溃Streams应用程序之后,如果我重新启动Streams应用程序,它会正常工作直到下一次选举,而我根本没有触及经纪人。

All servers (3 Kafka brokers and the Streams app) are running on EC2 instances. 所有服务器(3个Kafka代理和Streams应用程序)都在EC2实例上运行。

This is now fixed in 0.10.2.1. 现在已在0.10.2.1中修复。 If you can't pick that up, make sure you have these two parameters set as follows in your streams config: 如果你不能选择它,请确保在stream config中设置如下两个参数:

final Properties props = new Properties();
...
props.put(ProducerConfig.RETRIES_CONFIG, 10);  
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, Integer.toString(Integer.MAX_VALUE));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM