为什么我启动 spark stream 时 kafka 消费者代码会冻结？

Question

I am new to Kafka and trying to implement Kafka consumer logic in spark2 and when I run all my code in the shell and start the streaming it shows nothing.我是 Kafka 的新手，并试图在 spark2 中实现 Kafka 消费者逻辑，当我在 shell 中运行我的所有代码并开始流式传输时，它什么也没显示。

I have viewed many posts in StackOverflow but nothing helped me.我在 StackOverflow 中查看了很多帖子，但没有任何帮助。 I have even downloaded all the dependency jars from maven and tried to run but it still shows nothing.我什至从 maven 下载了所有依赖项 jars 并尝试运行，但它仍然没有显示任何内容。

Spark Version: 2.2.0 Scala version 2.11.8 jars I downloaded are kafka-clients-2.2.0.jar and spark-streaming-kafka-0-10_2.11-2.2.0.jar Spark Version: 2.2.0 Scala version 2.11.8 jars I downloaded are kafka-clients-2.2.0.jar and spark-streaming-kafka-0-10_2.11-2.2.0.jar

but it still I face the same issue.但我仍然面临同样的问题。

Please find the below code snippet请找到下面的代码片段

import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.{StreamingContext, Seconds}
import org.apache.spark.streaming.kafka010.{KafkaUtils, ConsumerStrategies, LocationStrategies}

val brokers = "host1:port, host2:port"
val groupid = "default"
val topics = "kafka_sample"
val topicset = topics.split(",").toSet

val ssc = new StreamingContext(sc, Seconds(2))

val kafkaParams = Map[String, Object](
  ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> brokers,
  ConsumerConfig.GROUP_ID_CONFIG -> groupid,
  ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
  ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer]
)

val msg = KafkaUtils.createDirectStream[String, String](
ssc, LocationStrategies.PreferConsistent, ConsumerStrategies.Subscribe[String, String](topicset, kafkaParams)
)

msg.foreachRDD{
rdd => rdd.collect().foreach(println)
}
ssc.start()

I am expecting SparkStreaming to start but it doesn't do anything.我期待 SparkStreaming 启动，但它什么也没做。 What mistake have I done here?我在这里犯了什么错误？ Or is this a known issue?或者这是一个已知问题？

Answer 1

The driver will be sitting idle unless you call ssc.awaitTermination() at the end.除非您最后调用ssc.awaitTermination() ，否则驱动程序将处于空闲状态。 If you're using spark-shell then it's not a good tool for streaming jobs.如果您使用的是 spark-shell，那么它不是用于流式作业的好工具。 Please, use interactive tools like Zeppelin or Spark notebook for interacting with streaming or try building your app as jar file and then deploy.请使用 Zeppelin 或 Spark notebook 等交互式工具与流式交互，或尝试将您的应用程序构建为 jar 文件，然后部署。

Also, if you're trying out spark streaming, Structured Streaming would be better as it is quite easy to play with.此外，如果您正在尝试火花流式传输，结构化流式传输会更好，因为它很容易玩。

http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

Answer 2

After ssc.start() use ssc.awaitTermination() in your code.在ssc.start()之后，在您的代码中使用ssc.awaitTermination() 。
For testing, write your code in a Main Object and run it in any IDE like Intellij为了进行测试，请在Main Object中编写代码并在任何 IDE 中运行它，例如 Intellij
You can use command shell and publish messages from the Kafka producer.您可以使用命令 shell 并从 Kafka 生产者发布消息。

I have written all these steps in a simple example in a blog post with working code in GitHub.我在一篇博客文章中的一个简单示例中编写了所有这些步骤，其中的工作代码位于 GitHub 中。 Please refer to: http://softwaredevelopercentral.blogspot.com/2018/10/spark-streaming-and-kafka-integration.html请参考： http://softwaredevelopercentral.blogspot.com/2018/10/spark-streaming-and-kafka-integration.html

为什么我启动 spark stream 时 kafka 消费者代码会冻结？

问题描述

2 个解决方案

解决方案1
1 2019-10-17 17:03:54

解决方案2
0 已采纳 2019-11-11 16:51:23

为什么我启动 spark stream 时 kafka 消费者代码会冻结？

问题描述

2 个解决方案

解决方案1 1 2019-10-17 17:03:54

解决方案2 0 已采纳 2019-11-11 16:51:23

解决方案1
1 2019-10-17 17:03:54

解决方案2
0 已采纳 2019-11-11 16:51:23