简体   繁体   English

Kafka的Spark流:如何从Kafka使用者DStream获得主题名称?

[英]Spark streaming for Kafka: How to get the topic name from Kafka consumer DStream?

I have set up the Spark-Kafka Consumer in Scala that receives messages from multiple topics: 我已经在Scala中设置了Spark-Kafka Consumer,它可以接收来自多个主题的消息:

val properties = readProperties()
val streamConf = new SparkConf().setMaster("local[*]").setAppName("Kafka-Stream")
val ssc = new StreamingContext(streamConf, Seconds(10))

val kafkaParams = Map("metadata.broker.list" ->  properties.getProperty("broker_connection_str"), 
                      "zookeeper.connect"    ->  properties.getProperty("zookeeper_connection_str"), 
                      "group.id"             ->  properties.getProperty("group_id"), 
                      "auto.offset.reset"    ->  properties.getProperty("offset_reset")
                    )

// Kafka integration with receiver 
val msgStream = KafkaUtils.createStream[Array[Byte], String, DefaultDecoder, StringDecoder](
  ssc, kafkaParams, Map(properties.getProperty("topic1") -> 1,
                      properties.getProperty("topic2") -> 2,
                      properties.getProperty("topic3") -> 3),
                      StorageLevel.MEMORY_ONLY_SER).map(_._2)

I need to develop corresponding action code for messages (which will be in JSON format) from each topic. 我需要为每个主题的消息(将采用JSON格式)开发相应的操作代码。

I referred to the following question, but the answer in it didn't help me: 我提到了以下问题,但其中的答案并没有帮助我:

get topic from Kafka message in spark 从Spark中的Kafka消息获取主题

So, is there any method on the received DStream that can be used to fetch topic name along with the message to determine what action should take place? 那么,接收到的DStream上是否有任何方法可用于与消息一起获取主题名称,以确定应该采取什么措施?

Any help on this would be greatly appreciated. 任何帮助,将不胜感激。 Thank you. 谢谢。

See the code below. 请参见下面的代码。

You can get topic name and message by foreachRDD, map operation on DStream. 您可以通过foreachRDD,DStream上的map操作获取主题名称和消息。

msgStream.foreachRDD(rdd => {
      val pairRdd = rdd.map(i => (i.topic(), i.value()))
})

The code below is an example source of createDirectStream that I am using. 下面的代码是我正在使用的createDirectStream的示例源。

val ssc = new StreamingContext(configLoader.sparkConfig, Seconds(conf.getInt(Conf.KAFKA_PULL_INTERVAL)))
val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> conf.getString(Conf.KAFKA_BOOTSTRAP_SERVERS),
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> conf.getString(Conf.KAFKA_CONSUMER_GID),
  "auto.offset.reset" -> conf.getString(Conf.KAFKA_AUTO_OFFSET_RESET),
  "enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics: Array[String] = conf.getString(Conf.KAFKA_TOPICS).split(",")
val stream = KafkaUtils.createDirectStream[String, String](
  ssc,
  PreferConsistent,
  Subscribe[String, String](topics, kafkaParams)
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM