简体   繁体   English

阅读多个主题并写入单个主题-Spark Streaming

[英]Read from multiple topics and write to single topic - Spark Streaming

How to read from multiple topics using spark readStream() which having different schemas, and writeStream() to a single topic using Spark StructedSchema . 如何使用具有不同架构的spark readStream()从多个主题读取内容,以及如何使用Spark StructedSchemawriteStream()写入单个主题。

Note: Each input topic having different schema 注意:每个输入主题具有不同的架构

How to read from multiple topics using spark readStream() which having different schemas, and writeStream() to a single topic using Spark StructedSchema ? 如何使用具有不同架构的spark readStream()从多个主题中读取内容,以及如何使用Spark StructedSchema从writeStream()到单个主题中读取内容?

I am giving general idea or pointers here .... may suite your case. 我在这里给出一般性的想法或指导.....可能适合您的情况。

I assume you are using avro messages, there are 2 topics one for message and another one is for schema I am referring to as message topic and schema topic. 我假设您使用的是Avro消息,有2个主题,一个主题是消息,另一个主题是架构,我将其称为消息主题和架构主题。

Now, prepare a generic row wrapper schema say avro_yourrow_wrapper.avsc which holds different schema messages(since you told each message has different schema). 现在,准备一个通用的行包装器架构,例如avro_yourrow_wrapper.avsc ,该架构包含不同的架构消息(因为您告诉每条消息具有不同的架构)。

For example: modify this sample as per your requirements. 例如:根据您的要求修改此样本。

{
  "type" : "record",
  "name" : "generic_schema",
  "namespace" : "yournamespace",
  "fields" : [ {
    "name" : "messagenameOrTableNames",
    "type" : "string"
  }, {
    "name" : "schema",
    "type" : "long"
  }, {
    "name" : "payload",
    "type" : "bytes"
  } ]
}

save it to file called avro_yourrow_wrapper.avsc since its static... 将其保存到名为avro_yourrow_wrapper.avsc的文件,因为其静态...

// Read the wrapper schema in your consumer.
    val inputStream = getClass.getResourceAsStream("avro_yourrow_wrapper.avsc")
    val source = scala.io.Source.fromInputStream(inputStream)
    val wrapperSchema = try source.mkString finally source.close()

from spark structured stream you will get a dataframe. 从spark结构化流中,您将获得一个数据框。 read the wrapper schema based on type of message apply record specific schema by reading schema topic and message topic read the avro message. 根据消息类型读取包装器模式,通过读取模式主题和消息主题读取avro消息,应用记录特定的模式。

Now using twitter bijection api (with GenericRecord ) you can decode the message in to readable format. 现在,使用twitter bijection api(带有GenericRecord ),您可以将消息解码为可读格式。

sample pseudo code snippet : 样本伪代码段:

import com.twitter.bijection.Injection
        import com.twitter.bijection.avro.GenericAvroCodecs
        import org.apache.avro.generic.GenericRecord
        val schema = new Schema.Parser().parse(localschema.get( recordlevelschema).get)
        val recordInjection: Injection[GenericRecord, Array[Byte]] = GenericAvroCodecs.toBinary(schema)
        val record: GenericRecord = recordInjection.invert(bytes).get
        log.info("record.getSchema" +record.getSchema)
        record.getSchema.getFields.toArray().foreach(x =>log.info(x.toString))

And then you can write in to separate topic as you wish. 然后,您可以根据需要编写单独的主题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark Structure Streaming 无法从 Kafka 主题读取消息 - Spark Structure Streaming not able to read message from Kafka topics Spark Streaming - 写入 Kafka 主题 - Spark Streaming - write to Kafka topic 如何从Spark Streaming开始从Kafka主题中读取记录? - How to read records from Kafka topic from beginning in Spark Streaming? 如何将 Spark Streaming DF 写入 Kafka 主题 - How to write spark streaming DF to Kafka topic 无法使用 spark scala 读取和写入 kafka 主题 - Can't Read from and write to kafka topic using spark scala 错误:使用 Spark Structured Streaming 读取和写入数据到 kafka 中的另一个主题 - Error: Using Spark Structured Streaming to read and write data to another topic in kafka 在使用 Kafka 的 Spark Structured streaming 中,Spark 如何管理多个主题的偏移量 - In Spark Structured streaming with Kafka, how spark manages offset for multiple topics 使用scala从kafka主题流式传输Spark - Spark streaming from kafka topic using scala 从 Kafka 主题读取数据并使用 scala 和 spark 写回 Kafka 主题 - Read from Kafka topic process the data and write back to Kafka topic using scala and spark 无法通过使用 scala 在 spark Streaming 中从 kafka 主题中获取数据来写入 csv 文件 - Not able to write csv file by taking data from kafka topic in spark Streaming with scala
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM