卡夫卡领导人选举导致Kafka Streams崩溃

Question

I have a Kafka Streams application consuming from and producing to a Kafka cluster with 3 brokers and a replication factor of 3. Other than the consumer offset topics (50 partitions), all other topics have only one partition each. 我有一个Kafka Streams应用程序消耗并生成具有3个代理的Kafka集群，复制因子为3.除了消费者偏移主题（50个分区）之外，所有其他主题每个只有一个分区。

When the brokers attempt a preferred replica election, the Streams app (which is running on a completely different instance than the brokers) fails with the error: 当代理尝试首选副本时，Streams应用程序（运行在与代理完全不同的实例上）失败并显示错误：

Caused by: org.apache.kafka.streams.errors.StreamsException: task [0_0] exception caught when producing
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:119)
    ...
    at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:197)
Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.

Is it normal that the Streams app attempts to be the leader for the partition, given that it's running on a server that's not part of the Kafka cluster? Streams应用程序尝试成为分区的领导者是正常的，因为它在不属于Kafka集群的服务器上运行？

I can reproduce this behaviour on demand by: 我可以通过以下方式重现此行为：

Killing one of the brokers (whereupon the other 2 take over as leader for all partitions that had the killed broker as their leader, as expected) 杀死其中一个经纪人（其他2个接管者作为所有分区的领导者，按照预期将杀死的经纪人作为他们的领导者）
Bringing the killed broker back up 让遇难的经纪人重新振作起来
Triggering a preferred replica leader election with bin/kafka-preferred-replica-election.sh --zookeeper localhost 使用bin/kafka-preferred-replica-election.sh --zookeeper localhost触发首选副本领导者选举bin/kafka-preferred-replica-election.sh --zookeeper localhost

My issue seems to be similar to this reported failure , so I'm wondering if this is a new Kafka Streams bug. 我的问题似乎与报告的失败类似，所以我想知道这是否是一个新的Kafka Streams错误。 My full stack trace is literally exactly the same as the gist linked in the reported failure ( here ). 我的完整堆栈跟踪与报告的失败中链接的要点完全相同（此处）。

Another potentially interesting detail is that during the leader election, I get these messages in the controller.log of the broker: 另一个可能有趣的细节是，在领导者选举期间，我在经纪人的controller.log中获取这些消息：

[2017-04-12 11:07:50,940] WARN [Controller-3-to-broker-3-send-thread], Controller 3's connection to broker BROKER-3-HOSTNAME:9092 (id: 3 rack: null) was unsuccessful (kafka.controller.RequestSendThread)
java.io.IOException: Connection to BROKER-3-HOSTNAME:9092 (id: 3 rack: null) failed
    at kafka.utils.NetworkClientBlockingOps$.awaitReady$1(NetworkClientBlockingOps.scala:84)
    at kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:94)
    at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:232)
    at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:185)
    at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:184)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

I initially thought this connection error was to blame, but after the leader election crashes the Streams app, if I restart the Streams app, it works normally until the next election, without me touching the brokers at all. 我最初认为这个连接错误是罪魁祸首，但是在领导者选举崩溃Streams应用程序之后，如果我重新启动Streams应用程序，它会正常工作直到下一次选举，而我根本没有触及经纪人。

All servers (3 Kafka brokers and the Streams app) are running on EC2 instances. 所有服务器（3个Kafka代理和Streams应用程序）都在EC2实例上运行。

Answer 1

This is now fixed in 0.10.2.1. 现在已在0.10.2.1中修复。 If you can't pick that up, make sure you have these two parameters set as follows in your streams config: 如果你不能选择它，请确保在stream config中设置如下两个参数：

final Properties props = new Properties();
...
props.put(ProducerConfig.RETRIES_CONFIG, 10);  
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, Integer.toString(Integer.MAX_VALUE));

卡夫卡领导人选举导致Kafka Streams崩溃

问题描述

1 个解决方案

解决方案1
9 已采纳 2017-04-12 20:45:59

卡夫卡领导人选举导致Kafka Streams崩溃

问题描述

1 个解决方案

解决方案1 9 已采纳 2017-04-12 20:45:59

解决方案1
9 已采纳 2017-04-12 20:45:59