簡體   English   中英

如何在Spark結構化流中使用Scala Case類映射Kafka源

[英]How to use Scala Case Class to map Kafka source in Spark Structured Streaming

我正在嘗試在Spark中使用結構化流,因為它非常適合我的用例。 但是,我似乎找不到找到將來自Kafka的傳入數據映射到case類的方法。 根據官方文檔,這可以走多遠。

import sparkSession.sqlContext.implicits._                          
val kafkaDF:DataFrame = sparkSession
                                          .readStream
                                          .format("kafka")
                                          .option("kafka.bootstrap.servers", bootstrapServers_CML)
                                          .option("subscribe", topics_ME)
                                          .option("startingOffsets", "latest")
                                          .load()
                                          .selectExpr("cast (value as string) as json") //Kakfa sends data in a specific schema (key, value, topic, offset, timestamp etc)    

val schema_ME = StructType(Seq(
    StructField("Parm1", StringType, true),
    StructField("Parm2", StringType, true),
    StructField("Parm3", TimestampType, true)))  

val mobEventDF:DataFrame = kafkaDF
                         .select(from_json($"json", schema_ME).as("mobEvent")) //Using a StructType to convert to application specific schema. Cant seem to use a case class for schema directly yet. Perhaps with later API??
                         .na.drop()

mobEventDF具有這樣的模式

root
 |-- appEvent: struct (nullable = true)
 |    |-- Parm1: string (nullable = true)
 |    |-- Parm2: string (nullable = true)
 |    |-- Parm3: string (nullable = true)

有一個更好的方法嗎? 我如何將其直接映射到Scala Case類(如下面的類)?

case class ME(name: String, 
                 factory: String,
                 delay: Timestamp)

選擇並重命名所有字段,然后as方法調用

kafkaDF.select($"mobEvent.*").toDF("name", "factory", "delay").as[ME]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM