简体   繁体   中英

Spark streaming for Kafka: How to get the topic name from Kafka consumer DStream?

I have set up the Spark-Kafka Consumer in Scala that receives messages from multiple topics:

val properties = readProperties()
val streamConf = new SparkConf().setMaster("local[*]").setAppName("Kafka-Stream")
val ssc = new StreamingContext(streamConf, Seconds(10))

val kafkaParams = Map("metadata.broker.list" ->  properties.getProperty("broker_connection_str"), 
                      "zookeeper.connect"    ->  properties.getProperty("zookeeper_connection_str"), 
                      "group.id"             ->  properties.getProperty("group_id"), 
                      "auto.offset.reset"    ->  properties.getProperty("offset_reset")
                    )

// Kafka integration with receiver 
val msgStream = KafkaUtils.createStream[Array[Byte], String, DefaultDecoder, StringDecoder](
  ssc, kafkaParams, Map(properties.getProperty("topic1") -> 1,
                      properties.getProperty("topic2") -> 2,
                      properties.getProperty("topic3") -> 3),
                      StorageLevel.MEMORY_ONLY_SER).map(_._2)

I need to develop corresponding action code for messages (which will be in JSON format) from each topic.

I referred to the following question, but the answer in it didn't help me:

get topic from Kafka message in spark

So, is there any method on the received DStream that can be used to fetch topic name along with the message to determine what action should take place?

Any help on this would be greatly appreciated. Thank you.

See the code below.

You can get topic name and message by foreachRDD, map operation on DStream.

msgStream.foreachRDD(rdd => {
      val pairRdd = rdd.map(i => (i.topic(), i.value()))
})

The code below is an example source of createDirectStream that I am using.

val ssc = new StreamingContext(configLoader.sparkConfig, Seconds(conf.getInt(Conf.KAFKA_PULL_INTERVAL)))
val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> conf.getString(Conf.KAFKA_BOOTSTRAP_SERVERS),
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> conf.getString(Conf.KAFKA_CONSUMER_GID),
  "auto.offset.reset" -> conf.getString(Conf.KAFKA_AUTO_OFFSET_RESET),
  "enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics: Array[String] = conf.getString(Conf.KAFKA_TOPICS).split(",")
val stream = KafkaUtils.createDirectStream[String, String](
  ssc,
  PreferConsistent,
  Subscribe[String, String](topics, kafkaParams)
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM