[英]Read from multiple topics and write to single topic - Spark Streaming
How to read from multiple topics using spark readStream()
which having different schemas, and writeStream()
to a single topic using Spark StructedSchema
. 如何使用具有不同架构的spark
readStream()
从多个主题读取内容,以及如何使用Spark StructedSchema
将writeStream()
写入单个主题。
Note: Each input topic having different schema 注意:每个输入主题具有不同的架构
How to read from multiple topics using spark readStream() which having different schemas, and writeStream() to a single topic using Spark StructedSchema ?
如何使用具有不同架构的spark readStream()从多个主题中读取内容,以及如何使用Spark StructedSchema从writeStream()到单个主题中读取内容?
I am giving general idea or pointers here .... may suite your case. 我在这里给出一般性的想法或指导.....可能适合您的情况。
I assume you are using avro messages, there are 2 topics one for message and another one is for schema I am referring to as message topic and schema topic. 我假设您使用的是Avro消息,有2个主题,一个主题是消息,另一个主题是架构,我将其称为消息主题和架构主题。
Now, prepare a generic row wrapper schema say avro_yourrow_wrapper.avsc
which holds different schema messages(since you told each message has different schema). 现在,准备一个通用的行包装器架构,例如
avro_yourrow_wrapper.avsc
,该架构包含不同的架构消息(因为您告诉每条消息具有不同的架构)。
For example: modify this sample as per your requirements. 例如:根据您的要求修改此样本。
{
"type" : "record",
"name" : "generic_schema",
"namespace" : "yournamespace",
"fields" : [ {
"name" : "messagenameOrTableNames",
"type" : "string"
}, {
"name" : "schema",
"type" : "long"
}, {
"name" : "payload",
"type" : "bytes"
} ]
}
save it to file called avro_yourrow_wrapper.avsc since its static... 将其保存到名为avro_yourrow_wrapper.avsc的文件,因为其静态...
// Read the wrapper schema in your consumer.
val inputStream = getClass.getResourceAsStream("avro_yourrow_wrapper.avsc")
val source = scala.io.Source.fromInputStream(inputStream)
val wrapperSchema = try source.mkString finally source.close()
from spark structured stream you will get a dataframe. 从spark结构化流中,您将获得一个数据框。 read the wrapper schema based on type of message apply record specific schema by reading schema topic and message topic read the avro message.
根据消息类型读取包装器模式,通过读取模式主题和消息主题读取avro消息,应用记录特定的模式。
Now using twitter bijection api (with GenericRecord
) you can decode the message in to readable format. 现在,使用twitter bijection api(带有
GenericRecord
),您可以将消息解码为可读格式。
sample pseudo code snippet : 样本伪代码段:
import com.twitter.bijection.Injection
import com.twitter.bijection.avro.GenericAvroCodecs
import org.apache.avro.generic.GenericRecord
val schema = new Schema.Parser().parse(localschema.get( recordlevelschema).get)
val recordInjection: Injection[GenericRecord, Array[Byte]] = GenericAvroCodecs.toBinary(schema)
val record: GenericRecord = recordInjection.invert(bytes).get
log.info("record.getSchema" +record.getSchema)
record.getSchema.getFields.toArray().foreach(x =>log.info(x.toString))
And then you can write in to separate topic as you wish. 然后,您可以根据需要编写单独的主题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.