简体繁体 English

如果某些Kafka节点时间偏移未同步，则Spark流式传输作业会停止

[英]Spark streaming job stuck if some Kafka nodes time offset is not synchronized

原文 2016-09-03 23:48:17 4 1 java/ apache-spark/ apache-kafka/ spark-streaming/ kafka-producer-api

We have a spark streaming job that reads from Gnip API and send tweets to a Kafka cluster. 我们有一个火花流式传输作业，它从Gnip API读取并将推文发送到Kafka集群。

The Kafka cluster is installed using Cloudera Manager. 使用Cloudera Manager安装Kafka群集。

Sometimes, the cloudera manager shows a bad health message for some Kafka nodes. 有时，cloudera管理器会为某些Kafka节点显示错误的运行状况消息。 The bad health message is related to the NTP service. 不良健康消息与NTP服务有关。 Some nodes suddenly be not synchronized with the NTP server. 某些节点突然与NTP服务器不同步。

Once this happen, the Spark streaming job stuck and a lot of jobs queued without processing for a long time. 一旦发生这种情况，Spark流媒体作业就会停滞不前，很多作业都会在没有处理的情况下排队等待很长时间。

Why the synchronization of Kafka nodes with the NTP server affect the Kafka producer in the spark streaming job? 为什么Kafka节点与NTP服务器的同步会影响火花流作业中的Kafka生产者？

1 个解决方案

Every partition has its leader and followers in Kafka brokers, by which Kafka provides its fault-tolerance. 每个分区都有其Kafka经纪人的领导者和追随者，Kafka通过该分区提供容错能力。 This mechanism is based on ZooKeeper, which uses NTP service. 此机制基于ZooKeeper，它使用NTP服务。

If you use the default configuration, the leader will receive your data, and try it best to write into followers. 如果您使用默认配置，领导者将收到您的数据，并尽量写入关注者。 It will not respond a success message until data are written into every follower. 在将数据写入每个关注者之前，它不会响应成功消息。 So your Spark application blocks. 所以你的Spark应用程序阻塞。

You can also change your Kafka configuration to respond immediately when leader receives data or respond immediately when the leader has written data into disk. 您还可以更改Kafka配置，以便在领导者接收数据时立即响应，或在领导者将数据写入磁盘时立即响应。

You can find more in Kafka documents . 您可以在Kafka文档中找到更多信息。