为 kafka 主题分配新分区（旧分区被撤销）后，Spark Streaming 作业失败：分区主题 1 没有当前分配

Question

Using spark streaming with kafka and creating a direct stream using below code-将 Spark Streaming 与 kafka 结合使用并使用以下代码创建直接流 -

val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> conf.getString("kafka.brokers"),
  "zookeeper.connect" -> conf.getString("kafka.zookeeper"),
  "group.id" -> conf.getString("kafka.consumergroups"),
  "auto.offset.reset" -> args { 1 },
  "enable.auto.commit" -> (conf.getString("kafka.autoCommit").toBoolean: java.lang.Boolean),
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "security.protocol" -> "SASL_PLAINTEXT",
  "session.timeout.ms" -> args { 2 },
  "max.poll.records" -> args { 3 },
  "request.timeout.ms" -> args { 4 },
  "fetch.max.wait.ms" -> args { 5 })

val messages = KafkaUtils.createDirectStream[String, String](
  ssc,
  LocationStrategies.PreferConsistent,
  ConsumerStrategies.Subscribe[String, String](topicsSet, kafkaParams))

After some processing we commit the offset using commitAsync API.经过一些处理后，我们使用 commitAsync API 提交偏移量。

try
{
messages.foreachRDD { rdd =>
  val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
  messages.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
}
}
catch
{   
 case e:Throwable => e.printStackTrace()
}

Below error causes the job to crash-以下错误导致作业崩溃-

            18/03/20 10:43:30 INFO ConsumerCoordinator: Revoking previously assigned partitions [TOPIC_NAME-3, TOPIC_NAME-5, TOPIC_NAME-4] for group 21_feb_reload_2
            18/03/20 10:43:30 INFO AbstractCoordinator: (Re-)joining group 21_feb_reload_2
            18/03/20 10:43:30 INFO AbstractCoordinator: (Re-)joining group 21_feb_reload_2
            18/03/20 10:44:00 INFO AbstractCoordinator: Successfully joined group 21_feb_reload_2 with generation 20714
            18/03/20 10:44:00 INFO ConsumerCoordinator: Setting newly assigned partitions [TOPIC_NAME-1, TOPIC_NAME-0, TOPIC_NAME-2] for group 21_feb_reload_2
            18/03/20 10:44:00 ERROR JobScheduler: Error generating jobs for time 1521557010000 ms
            java.lang.IllegalStateException: No current assignment for partition TOPIC_NAME-4
                at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:251)
                at org.apache.kafka.clients.consumer.internals.SubscriptionState.needOffsetReset(SubscriptionState.java:315)
                at org.apache.kafka.clients.consumer.KafkaConsumer.seekToEnd(KafkaConsumer.java:1170)
                at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:197)
                at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:214)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
                at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
                at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
                at scala.Option.orElse(Option.scala:289)
                at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
                at org.apache.spark.streaming.dstream.MappedDStream.compute(MappedDStream.scala:36)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
                at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
                at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
                at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
                at scala.Option.orElse(Option.scala:289)
                at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
                at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:48)
                at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:117)
                at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
                at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
                at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
                at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
                at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
                at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
                at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
                at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
                at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:249)
                at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:247)
                at scala.util.Try$.apply(Try.scala:192)
                at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:247)
                at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:183)
                at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:89)
                at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
                at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
            18/03/20 10:44:00 ERROR ApplicationMaster: User class threw exception: java.lang.IllegalStateException: No current assignment for partition

My findings -我的发现-

1- Similar issue from the post - Kafka Spark Stream throws Exception:No current assignment for partition This does not give much explanation to why use Assign rather than Subscribe. 1- 帖子中的类似问题 - Kafka Spark Stream throws Exception:No current assignment for partition这并没有对为什么使用 Assign 而不是 Subscribe 给出太多解释。

2- Tying to make sure there is no re-balancing, I increased the session.timeout.ms to almost my batch duration as my processing gets completed in less than 2 min(batch duration). 2- 为了确保没有重新平衡，我将 session.timeout.ms 增加到几乎我的批处理持续时间，因为我的处理在不到 2 分钟（批处理持续时间）内完成。

session.timeout.ms- The amount of time a consumer can be out of contact with the brokers while still considered alive ( https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch04.html ) session.timeout.ms-消费者在被认为还活着的时候可以与经纪人失去联系的时间（ https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch04.html )

3- Came across Re-balance Listeners with methods - a onPartitionsRevoked b onPartitionsAssigned 3- 遇到 Re-balance Listeners with methods - a onPartitionsRevoked b onPartitionsAssigned

But was unable to understand how can I use the first one which commits offset just before rebalancing.但是无法理解我如何使用第一个在重新平衡之前提交偏移量的。

Any Inputs will be much appreciated.任何输入将不胜感激。

Answer 1

I have faced the same issue.我遇到过同样的问题。 when two of my spark jobs are using the same kafka client.id.So i have assigned the new kafka client for another job当我的两个 spark 作业使用相同的 kafka client.id.So 我已经为另一个作业分配了新的 kafka 客户端

Answer 2

The Spark streaming with kafka guide proposes using different group id for each spark streaming job. 带有kafka的Spark流指南建议为每个Spark流作业使用不同的组ID。 After discussion with my team, we are now following this approach of using different group id for all the spark streaming jobs. 与我的团队讨论之后，我们现在采用这种方法，对所有火花流作业使用不同的组ID。

Running for a week now, we have not seen this error. 现在运行了一周，我们没有看到此错误。

https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html

为 kafka 主题分配新分区（旧分区被撤销）后，Spark Streaming 作业失败：分区主题 1 没有当前分配

问题描述

1 个解决方案

解决方案1
0 2020-02-04 11:22:30

解决方案2
-1 2018-04-24 18:47:39

为 kafka 主题分配新分区（旧分区被撤销）后，Spark Streaming 作业失败：分区主题 1 没有当前分配

问题描述

1 个解决方案

解决方案1 0 2020-02-04 11:22:30

解决方案2 -1 2018-04-24 18:47:39

解决方案1
0 2020-02-04 11:22:30

解决方案2
-1 2018-04-24 18:47:39