Spark Streaming + Kafka:如何从kafka消息中检查主题名称

[英]Spark Streaming + Kafka: how to check name of topic from kafka message

I am using Spark Streaming to read from a list of Kafka Topics. 我正在使用Spark Streaming从Kafka主题列表中读取。 I am following the official API at this link . 我正在关注此链接的官方API。 The method I am using is: 我使用的方法是:

val kafkaParams = Map("metadata.broker.list" -> configuration.getKafkaBrokersList(), "auto.offset.reset" -> "largest")
val topics = Set(configuration.getKafkaInputTopic())
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
            ssc, kafkaParams, topics)

I am wondering how will the executor read from the message from the list of topics ? 我想知道遗嘱执行人将如何从主题列表中读取消息? What will be their policy? 他们的政策是什么? Will they read a topic and then when they finish the messages pass to the other topics? 他们会阅读一个主题,然后当他们完成消息传递给其他主题?

And most importantly, how can I, after calling this method, check what is the topic of a message in the RDD? 最重要的是,在调用此方法后,我怎样才能检查RDD中消息的主题是什么?

stream.foreachRDD(rdd => rdd.map(t => {
        val key = t._1
        val json = t._2
        val topic = ???

In the direct streaming approach, the driver is responsible for reading the offsets into the Kafka topics you want to consume. 在直接流方法中,驱动程序负责读取您要使用的Kafka主题的偏移量。 What it does it create a mapping between topics, partitions and the offsets that need to be read. 它的作用是在主题,分区和需要读取的偏移之间创建映射。 After that happens, the driver assigns each worker a range to read into a specific Kafka topic. 在此之后,驱动程序会为每个工作人员分配一个范围,以便读入特定的Kafka主题。 This means that if a single worker can run 2 tasks simultaneously (just for the sake of the example, it usually can run many more), then it can potentially read from two separate topics of Kafka concurrently. 这意味着如果一个工作者可以同时运行2个任务(仅仅是为了示例,它通常可以运行更多),那么它可以同时从两个独立的Kafka主题中读取。

how can I, after calling this method, check what is the topic of a message in the RDD? 在调用此方法后,我怎样才能检查RDD中消息的主题是什么?

You can use the overload of createDirectStream which takes a MessageHandler[K, V] : 您可以使用createDirectStream的重载,该重载采用MessageHandler[K, V]

val topicsToPartitions: Map[TopicAndPartition, Long] = ???

val stream: DStream[(String, String)] = 
  KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
        mam: MessageAndMetadata[String, String]) => (mam.topic(), mam.message())

