[英]Error “This server is not the leader for that topic-partition” while running Kafka performance traffic
Update Aug 15, 2018 I executed strace to monitor system call mprotect , and found it was blocked for several seconds indeed. 更新2018年8月15日,我执行了strace来监视系统调用mprotect ,发现它确实被阻塞了几秒钟。
strace -f -e trace=mprotect,mmap,munmap -T -t -p `pidof java` 2>&1 |tee mp1.txt
[pid 27007] 03:52:48 mprotect(0x7f9766226000, 4096, PROT_NONE) = 0 <3.631704>
But I didn't identify the reaszon. 但是我没有发现重新定位。
Update Aug 14, 2018 I found it is a JVM STW event. 更新2018年8月14日,我发现这是一个JVM STW事件。 I debugged JVM with options below 我使用以下选项调试了JVM
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1
-XX:+SafepointTimeout
-XX:SafepointTimeoutDelay=500
Got some log below 在下面有一些日志
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
488.188: no vm operation [ 73 1 1 ] [ 1 0 3301 0 0 ] 1
2018-08-13T22:16:09.744-0400: 491.491: Total time for which application threads were stopped: 3.3021375 seconds, Stopping threads took: 3.3018193 seconds
The strange thing is that spin/block time is ZERO while sync time is 3301. I compiled a JVM based on open jdk 1.8 and add some debug log to it, I found it was blocked on the codes below, 奇怪的是旋转/阻止时间为零,而同步时间为3301。我基于开放的jdk 1.8编译了一个JVM,并向其中添加了一些调试日志,我发现它在以下代码中被阻止,
void SafepointSynchronize::begin() {
... ...
if (UseCompilerSafepoints && DeferPollingPageLoopCount < 0) {
// Make polling safepoint aware
guarantee (PageArmed == 0, "invariant") ;
PageArmed = 1 ;
os::make_polling_page_unreadable();
}
... ...
}
In function os::make_polling_page_unreadable, calling ::mprotect which has semphore dependency, 在函数os :: make_polling_page_unread中,调用:: sprotect具有semphore依赖性,
down_write(¤t->mm->mmap_sem);
I doubt semophore mmap_sem contention leads to this STW event. 我怀疑smophore mmap_sem争用会导致此STW事件。 But I don't know which function leads to this? 但是我不知道哪个函数导致了这个? Any help here? 这里有什么帮助吗?
Original Question 原始问题
I am now testing Kafka's performance. 我现在正在测试Kafka的性能。 I created a topic in a cluster of 6 nodes with 36 partitions and 4 replicas. 我在一个由36个分区和4个副本组成的6个节点的群集中创建了一个主题。 One zookeeper node runs on a seperate node. 一个Zookeeper节点在单独的节点上运行。
kafka-topics --create --topic kf.p36.r4 --zookeeper l2 --partitions 36 --replication-factor 4
[root@g9csf002-0-0-3 kafka]# kafka-topics --describe --zookeeper l2 --topic kf.p36.r4
Topic:kf.p36.r4 PartitionCount:36 ReplicationFactor:4 Configs:
Topic: kf.p36.r4 Partition: 0 Leader: 1 Replicas: 1,5,6,2 Isr: 5,2,6,1
Topic: kf.p36.r4 Partition: 1 Leader: 2 Replicas: 2,6,1,3 Isr: 1,3,6,2
Topic: kf.p36.r4 Partition: 2 Leader: 3 Replicas: 3,1,2,4 Isr: 3,4,2,1
Topic: kf.p36.r4 Partition: 3 Leader: 4 Replicas: 4,2,3,5 Isr: 3,2,4,5
Topic: kf.p36.r4 Partition: 4 Leader: 5 Replicas: 5,3,4,6 Isr: 3,6,4,5
Topic: kf.p36.r4 Partition: 5 Leader: 6 Replicas: 6,4,5,1 Isr: 4,5,6,1
Topic: kf.p36.r4 Partition: 6 Leader: 1 Replicas: 1,6,2,3 Isr: 3,6,2,1
Topic: kf.p36.r4 Partition: 7 Leader: 2 Replicas: 2,1,3,4 Isr: 3,4,2,1
Topic: kf.p36.r4 Partition: 8 Leader: 3 Replicas: 3,2,4,5 Isr: 3,2,4,5
Topic: kf.p36.r4 Partition: 9 Leader: 4 Replicas: 4,3,5,6 Isr: 3,6,4,5
Topic: kf.p36.r4 Partition: 10 Leader: 5 Replicas: 5,4,6,1 Isr: 4,5,6,1
Topic: kf.p36.r4 Partition: 11 Leader: 6 Replicas: 6,5,1,2 Isr: 5,2,6,1
Topic: kf.p36.r4 Partition: 12 Leader: 1 Replicas: 1,2,3,4 Isr: 3,4,2,1
Topic: kf.p36.r4 Partition: 13 Leader: 2 Replicas: 2,3,4,5 Isr: 3,2,4,5
Topic: kf.p36.r4 Partition: 14 Leader: 3 Replicas: 3,4,5,6 Isr: 3,6,4,5
Topic: kf.p36.r4 Partition: 15 Leader: 4 Replicas: 4,5,6,1 Isr: 4,5,6,1
Topic: kf.p36.r4 Partition: 16 Leader: 5 Replicas: 5,6,1,2 Isr: 5,2,6,1
Topic: kf.p36.r4 Partition: 17 Leader: 6 Replicas: 6,1,2,3 Isr: 3,2,6,1
Topic: kf.p36.r4 Partition: 18 Leader: 1 Replicas: 1,3,4,5 Isr: 3,4,5,1
Topic: kf.p36.r4 Partition: 19 Leader: 2 Replicas: 2,4,5,6 Isr: 6,2,4,5
Topic: kf.p36.r4 Partition: 20 Leader: 3 Replicas: 3,5,6,1 Isr: 3,5,6,1
Topic: kf.p36.r4 Partition: 21 Leader: 4 Replicas: 4,6,1,2 Isr: 4,2,6,1
Topic: kf.p36.r4 Partition: 22 Leader: 5 Replicas: 5,1,2,3 Isr: 3,5,2,1
Topic: kf.p36.r4 Partition: 23 Leader: 6 Replicas: 6,2,3,4 Isr: 3,6,2,4
Topic: kf.p36.r4 Partition: 24 Leader: 1 Replicas: 1,4,5,6 Isr: 4,5,6,1
Topic: kf.p36.r4 Partition: 25 Leader: 2 Replicas: 2,5,6,1 Isr: 1,6,2,5
Topic: kf.p36.r4 Partition: 26 Leader: 3 Replicas: 3,6,1,2 Isr: 3,2,6,1
Topic: kf.p36.r4 Partition: 27 Leader: 4 Replicas: 4,1,2,3 Isr: 3,4,2,1
Topic: kf.p36.r4 Partition: 28 Leader: 5 Replicas: 5,2,3,4 Isr: 3,2,4,5
Topic: kf.p36.r4 Partition: 29 Leader: 6 Replicas: 6,3,4,5 Isr: 3,6,4,5
Topic: kf.p36.r4 Partition: 30 Leader: 1 Replicas: 1,5,6,2 Isr: 5,2,6,1
Topic: kf.p36.r4 Partition: 31 Leader: 2 Replicas: 2,6,1,3 Isr: 1,3,6,2
Topic: kf.p36.r4 Partition: 32 Leader: 3 Replicas: 3,1,2,4 Isr: 3,4,2,1
Topic: kf.p36.r4 Partition: 33 Leader: 4 Replicas: 4,2,3,5 Isr: 3,2,4,5
Topic: kf.p36.r4 Partition: 34 Leader: 5 Replicas: 5,3,4,6 Isr: 3,6,4,5
Topic: kf.p36.r4 Partition: 35 Leader: 6 Replicas: 6,4,5,1 Isr: 4,5,6,1
I run two instances of producers, kafka-producer-perf-test 我运行了两个生产者实例, kafka-producer-perf-test
kafka-producer-perf-test --topic kf.p36.r4 --num-records 600000000 --record-size 1024 --throughput 120000 --producer-props bootstrap.servers=b3:9092,b4:9092,b5:9092,b6:9092,b7:9092,b8:9092 acks=1
The total taffic is 240k tps and every message is 1024 bytes. 总流量为240k tps,每个消息为1024字节。 When I run 240k tps traffic, everything is OK at first, but after some time, some error information appeared. 当我运行240k tps流量时,起初一切正常,但过了一段时间,出现了一些错误信息。
[root@g9csf002-0-0-1 ~]# kafka-producer-perf-test --topic kf.p36.r4 --num-records 600000000 --record-size 1024 --throughput 120000 --producer-props bootstrap.servers=b3:9092,b4:9092,b5:9092,b6:9092,b7:9092,b8:9092 acks=1
599506 records sent, 119901.2 records/sec (117.09 MB/sec), 4.8 ms avg latency, 147.0 max latency.
600264 records sent, 120052.8 records/sec (117.24 MB/sec), 2.0 ms avg latency, 13.0 max latency.
599584 records sent, 119916.8 records/sec (117.11 MB/sec), 1.9 ms avg latency, 13.0 max latency.
600760 records sent, 120152.0 records/sec (117.34 MB/sec), 1.9 ms avg latency, 13.0 max latency.
599764 records sent, 119904.8 records/sec (117.09 MB/sec), 2.0 ms avg latency, 35.0 max latency.
276603 records sent, 21408.9 records/sec (20.91 MB/sec), 103.0 ms avg latency, 10743.0 max latency.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
I studied the logs of kafka broker, and found there are something wrong with the communication between the brokers and zookeeper. 我研究了卡夫卡经纪人的日志,发现经纪人和动物园管理员之间的通信有问题。
[2018-08-06 01:28:02,562] WARN Client session timed out, have not heard from server in 7768ms for sessionid 0x164f8ea86020062 (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,562] INFO Client session timed out, have not heard from server in 7768ms for sessionid 0x164f8ea86020062, clo
the zookeeper client is zookeeper-3.4.10.jar, I download the codes and add some logs to src/java/main/org/apache/zookeeper/ClientCnxn.java zookeeper客户端是zookeeper-3.4.10.jar,我下载了代码并将一些日志添加到src / java / main / org / apache / zookeeper / ClientCnxn.java
and found SendThread may be blocked sometimes when access the variable state 并且发现访问变量状态时有时可能会阻止SendThread
[2018-08-06 01:27:54,793] INFO ROVER: start of loop. (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:27:54,793] INFO ROVER: state = CONNECTED (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:27:54,793] INFO ROVER: to = 4000 (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:27:54,793] INFO ROVER: timeToNextPing = 2000 (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:27:54,793] INFO ROVER: before clientCnxnSocket.doTransport, to = 2000 (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:27:56,795] INFO ROVER: after clientCnxnSocket.doTransport (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: state = CONNECTED (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: start of loop. (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: state = CONNECTED (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: to = 1998 (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: timeToNextPing = -1002 (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: sendPing has done. (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: before clientCnxnSocket.doTransport, to = 1998 (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: after clientCnxnSocket.doTransport (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: state = CONNECTED (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: start of loop. (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: state = CONNECTED (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,561] INFO ROVER: to = -3768 (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,562] WARN Client session timed out, have not heard from server in 7768ms for sessionid 0x164f8ea86020062 (org.apache.zookeeper.ClientCnxn)
[2018-08-06 01:28:02,562] INFO Client session timed out, have not heard from server in 7768ms for sessionid 0x164f8ea86020062, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
You can found between 2018-08-06 01:27:56 and 2018-08-06 01:28:02 , the thread is blocked, do nothing. 您可以在2018-08-06 01:27:56和2018-08-06 01:28:02之间发现,该线程被阻塞,什么也不做。 the changed codes is shown below, 更改后的代码如下所示,
// If we are in read-only mode, seek for read/write server
if (state == States.CONNECTEDREADONLY) {
long now = System.currentTimeMillis();
int idlePingRwServer = (int) (now - lastPingRwServer);
if (idlePingRwServer >= pingRwTimeout) {
lastPingRwServer = now;
idlePingRwServer = 0;
pingRwTimeout =
Math.min(2*pingRwTimeout, maxPingRwTimeout);
pingRwServer();
}
to = Math.min(to, pingRwTimeout - idlePingRwServer);
}
LOG.info("ROVER: before clientCnxnSocket.doTransport, to = " + to );
clientCnxnSocket.doTransport(to, pendingQueue, outgoingQueue, ClientCnxn.this);
LOG.info("ROVER: after clientCnxnSocket.doTransport");
LOG.info("ROVER: state = " + state);
} catch (Throwable e) {
the kafka installed is confluent-kafka-2.11, and java is 安装的kafka是confluent-kafka-2.11,而java是
[root@g9csf0002-0-0-12 kafka]# java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
Now I don't know how to fix the problem, could anyone shed some lights on this? 现在,我不知道该如何解决该问题,有人可以对此进行解释吗?
I ran into this problem before, sometimes the Kafka JVM would garbage collect for a long time, or there was something weird going on network wise internally. 之前我遇到过这个问题,有时Kafka JVM会长时间垃圾收集,或者内部在网络上发生一些奇怪的事情。 I noticed the timeouts were all around the 6 second or 7 second mark in our case (which seems similar to yours). 我注意到在我们的情况下,超时时间都在6秒或7秒左右(这与您的情况类似)。 The thing is, Kafka freaks out if it can't to zookeeper within the alloted time period, and it starts reporting underreplicated partitions, bringing down the whole cluster every so often. 关键是,如果卡夫卡在规定的时间内无法向动物园管理员提供帮助,它就会开始报告复制不足的分区,从而频频关闭整个集群。 So we increased the timeout to 15 seconds, if I recall correctly, and everything worked just fine after that, with zero errors. 因此,如果我没有记错的话,我们将超时时间增加到15秒,并且此后一切正常,错误为零。
These are the coresponding settings on the kafka brokers: 这些是kafka代理上的核心响应设置:
zookeeper.session.timeout.ms Default: 6000ms
zookeeper.connection.timeout.ms
We changed both, IIRC, but you should try changing the session
config first. 我们同时更改了两者,IIRC,但是您应该首先尝试更改session
配置。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.