[英]Spark Streaming + Kafka: how to check name of topic from kafka message
I am using Spark Streaming to read from a list of Kafka Topics. 我正在使用Spark Streaming从Kafka主题列表中读取。 I am following the official API at this link . 我正在关注此链接的官方API。 The method I am using is: 我使用的方法是:
val kafkaParams = Map("metadata.broker.list" -> configuration.getKafkaBrokersList(), "auto.offset.reset" -> "largest")
val topics = Set(configuration.getKafkaInputTopic())
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, topics)
I am wondering how will the executor read from the message from the list of topics ? 我想知道遗嘱执行人将如何从主题列表中读取消息? What will be their policy? 他们的政策是什么? Will they read a topic and then when they finish the messages pass to the other topics? 他们会阅读一个主题,然后当他们完成消息传递给其他主题?
And most importantly, how can I, after calling this method, check what is the topic of a message in the RDD? 最重要的是,在调用此方法后,我怎样才能检查RDD中消息的主题是什么?
stream.foreachRDD(rdd => rdd.map(t => {
val key = t._1
val json = t._2
val topic = ???
})
I am wondering how will the executor read from the message from the list of topics ? 我想知道遗嘱执行人将如何从主题列表中读取消息? What will be their policy? 他们的政策是什么? Will they read a topic and then when they finish the messages pass to the other topics? 他们会阅读一个主题,然后当他们完成消息传递给其他主题?
In the direct streaming approach, the driver is responsible for reading the offsets into the Kafka topics you want to consume. 在直接流方法中,驱动程序负责读取您要使用的Kafka主题的偏移量。 What it does it create a mapping between topics, partitions and the offsets that need to be read. 它的作用是在主题,分区和需要读取的偏移之间创建映射。 After that happens, the driver assigns each worker a range to read into a specific Kafka topic. 在此之后,驱动程序会为每个工作人员分配一个范围,以便读入特定的Kafka主题。 This means that if a single worker can run 2 tasks simultaneously (just for the sake of the example, it usually can run many more), then it can potentially read from two separate topics of Kafka concurrently. 这意味着如果一个工作者可以同时运行2个任务(仅仅是为了示例,它通常可以运行更多),那么它可以同时从两个独立的Kafka主题中读取。
how can I, after calling this method, check what is the topic of a message in the RDD? 在调用此方法后,我怎样才能检查RDD中消息的主题是什么?
You can use the overload of createDirectStream
which takes a MessageHandler[K, V]
: 您可以使用createDirectStream
的重载,该重载采用MessageHandler[K, V]
:
val topicsToPartitions: Map[TopicAndPartition, Long] = ???
val stream: DStream[(String, String)] =
KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc,
kafkaParams,
topicsToPartitions,
mam: MessageAndMetadata[String, String]) => (mam.topic(), mam.message())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.