简体   繁体   English

为什么我启动 spark stream 时 kafka 消费者代码会冻结?

[英]Why does the kafka consumer code freeze when I start spark stream?

I am new to Kafka and trying to implement Kafka consumer logic in spark2 and when I run all my code in the shell and start the streaming it shows nothing.我是 Kafka 的新手,并试图在 spark2 中实现 Kafka 消费者逻辑,当我在 shell 中运行我的所有代码并开始流式传输时,它什么也没显示。

I have viewed many posts in StackOverflow but nothing helped me.我在 StackOverflow 中查看了很多帖子,但没有任何帮助。 I have even downloaded all the dependency jars from maven and tried to run but it still shows nothing.我什至从 maven 下载了所有依赖项 jars 并尝试运行,但它仍然没有显示任何内容。

Spark Version: 2.2.0 Scala version 2.11.8 jars I downloaded are kafka-clients-2.2.0.jar and spark-streaming-kafka-0-10_2.11-2.2.0.jar Spark Version: 2.2.0 Scala version 2.11.8 jars I downloaded are kafka-clients-2.2.0.jar and spark-streaming-kafka-0-10_2.11-2.2.0.jar

but it still I face the same issue.但我仍然面临同样的问题。

Please find the below code snippet请找到下面的代码片段

import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.{StreamingContext, Seconds}
import org.apache.spark.streaming.kafka010.{KafkaUtils, ConsumerStrategies, LocationStrategies}

val brokers = "host1:port, host2:port"
val groupid = "default"
val topics = "kafka_sample"
val topicset = topics.split(",").toSet

val ssc = new StreamingContext(sc, Seconds(2))

val kafkaParams = Map[String, Object](
  ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> brokers,
  ConsumerConfig.GROUP_ID_CONFIG -> groupid,
  ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
  ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer]
)

val msg = KafkaUtils.createDirectStream[String, String](
ssc, LocationStrategies.PreferConsistent, ConsumerStrategies.Subscribe[String, String](topicset, kafkaParams)
)

msg.foreachRDD{
rdd => rdd.collect().foreach(println)
}
ssc.start()

I am expecting SparkStreaming to start but it doesn't do anything.我期待 SparkStreaming 启动,但它什么也没做。 What mistake have I done here?我在这里犯了什么错误? Or is this a known issue?或者这是一个已知问题?

The driver will be sitting idle unless you call ssc.awaitTermination() at the end.除非您最后调用ssc.awaitTermination() ,否则驱动程序将处于空闲状态。 If you're using spark-shell then it's not a good tool for streaming jobs.如果您使用的是 spark-shell,那么它不是用于流式作业的好工具。 Please, use interactive tools like Zeppelin or Spark notebook for interacting with streaming or try building your app as jar file and then deploy.请使用 Zeppelin 或 Spark notebook 等交互式工具与流式交互,或尝试将您的应用程序构建为 jar 文件,然后部署。

Also, if you're trying out spark streaming, Structured Streaming would be better as it is quite easy to play with.此外,如果您正在尝试火花流式传输,结构化流式传输会更好,因为它很容易玩。

http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

  1. After ssc.start() use ssc.awaitTermination() in your code.ssc.start()之后,在您的代码中使用ssc.awaitTermination()
  2. For testing, write your code in a Main Object and run it in any IDE like Intellij为了进行测试,请在Main Object中编写代码并在任何 IDE 中运行它,例如 Intellij
  3. You can use command shell and publish messages from the Kafka producer.您可以使用命令 shell 并从 Kafka 生产者发布消息。

I have written all these steps in a simple example in a blog post with working code in GitHub.我在一篇博客文章中的一个简单示例中编写了所有这些步骤,其中的工作代码位于 GitHub 中。 Please refer to: http://softwaredevelopercentral.blogspot.com/2018/10/spark-streaming-and-kafka-integration.html请参考: http://softwaredevelopercentral.blogspot.com/2018/10/spark-streaming-and-kafka-integration.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM