简体   繁体   English

Spark Streaming + Kafka:如何从kafka消息中检查主题名称

[英]Spark Streaming + Kafka: how to check name of topic from kafka message

I am using Spark Streaming to read from a list of Kafka Topics. 我正在使用Spark Streaming从Kafka主题列表中读取。 I am following the official API at this link . 我正在关注此链接的官方API。 The method I am using is: 我使用的方法是:

val kafkaParams = Map("metadata.broker.list" -> configuration.getKafkaBrokersList(), "auto.offset.reset" -> "largest")
val topics = Set(configuration.getKafkaInputTopic())
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
            ssc, kafkaParams, topics)

I am wondering how will the executor read from the message from the list of topics ? 我想知道遗嘱执行人将如何从主题列表中读取消息? What will be their policy? 他们的政策是什么? Will they read a topic and then when they finish the messages pass to the other topics? 他们会阅读一个主题,然后当他们完成消息传递给其他主题?

And most importantly, how can I, after calling this method, check what is the topic of a message in the RDD? 最重要的是,在调用此方法后,我怎样才能检查RDD中消息的主题是什么?

stream.foreachRDD(rdd => rdd.map(t => {
        val key = t._1
        val json = t._2
        val topic = ???
})

I am wondering how will the executor read from the message from the list of topics ? 我想知道遗嘱执行人将如何从主题列表中读取消息? What will be their policy? 他们的政策是什么? Will they read a topic and then when they finish the messages pass to the other topics? 他们会阅读一个主题,然后当他们完成消息传递给其他主题?

In the direct streaming approach, the driver is responsible for reading the offsets into the Kafka topics you want to consume. 在直接流方法中,驱动程序负责读取您要使用的Kafka主题的偏移量。 What it does it create a mapping between topics, partitions and the offsets that need to be read. 它的作用是在主题,分区和需要读取的偏移之间创建映射。 After that happens, the driver assigns each worker a range to read into a specific Kafka topic. 在此之后,驱动程序会为每个工作人员分配一个范围,以便读入特定的Kafka主题。 This means that if a single worker can run 2 tasks simultaneously (just for the sake of the example, it usually can run many more), then it can potentially read from two separate topics of Kafka concurrently. 这意味着如果一个工作者可以同时运行2个任务(仅仅是为了示例,它通常可以运行更多),那么它可以同时从两个独立的Kafka主题中读取。

how can I, after calling this method, check what is the topic of a message in the RDD? 在调用此方法后,我怎样才能检查RDD中消息的主题是什么?

You can use the overload of createDirectStream which takes a MessageHandler[K, V] : 您可以使用createDirectStream的重载,该重载采用MessageHandler[K, V]

val topicsToPartitions: Map[TopicAndPartition, Long] = ???

val stream: DStream[(String, String)] = 
  KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
        ssc, 
        kafkaParams, 
        topicsToPartitions,
        mam: MessageAndMetadata[String, String]) => (mam.topic(), mam.message())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka的Spark流:如何从Kafka使用者DStream获得主题名称? - Spark streaming for Kafka: How to get the topic name from Kafka consumer DStream? 如何在通过 Spark Streaming Scala 使用来自 Kafka 主题的消息时设置特定的偏移量 - How to set specific offset number while consuming message from Kafka topic through Spark streaming Scala 如何从Spark Streaming开始从Kafka主题中读取记录? - How to read records from Kafka topic from beginning in Spark Streaming? 使用scala从kafka主题流式传输Spark - Spark streaming from kafka topic using scala 如何将 Spark Streaming DF 写入 Kafka 主题 - How to write spark streaming DF to Kafka topic Spark Streaming - 写入 Kafka 主题 - Spark Streaming - write to Kafka topic 如何将 Spark 模式应用于 Spark Structured Streaming 中基于 Kafka 主题名称的查询? - How to apply Spark schema to the query based on Kafka topic name in Spark Structured Streaming? 如何使用 Spark Structured Streaming 将数据从 Kafka 主题流式传输到 Delta 表 - How to stream data from Kafka topic to Delta table using Spark Structured Streaming 如何在Spark Streaming作业的每批中使用不同的Kafka主题? - How to consume from a different Kafka topic in each batch of a Spark Streaming job? 从Kafka上的JSON消息创建Spark Streaming中的Spark DataFrame - Create Spark DataFrame in Spark Streaming from JSON Message on Kafka
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM