[英]How to use Scala Case Class to map Kafka source in Spark Structured Streaming
I am trying to use structured streaming in spark as it fits my use case well. 我正在尝试在Spark中使用结构化流,因为它非常适合我的用例。 However I cant seem to find a way to map the incoming data from Kafka into a case class. 但是,我似乎找不到找到将来自Kafka的传入数据映射到case类的方法。 This is how far I could go based on official documentation. 根据官方文档,这可以走多远。
import sparkSession.sqlContext.implicits._
val kafkaDF:DataFrame = sparkSession
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", bootstrapServers_CML)
.option("subscribe", topics_ME)
.option("startingOffsets", "latest")
.load()
.selectExpr("cast (value as string) as json") //Kakfa sends data in a specific schema (key, value, topic, offset, timestamp etc)
val schema_ME = StructType(Seq(
StructField("Parm1", StringType, true),
StructField("Parm2", StringType, true),
StructField("Parm3", TimestampType, true)))
val mobEventDF:DataFrame = kafkaDF
.select(from_json($"json", schema_ME).as("mobEvent")) //Using a StructType to convert to application specific schema. Cant seem to use a case class for schema directly yet. Perhaps with later API??
.na.drop()
mobEventDF has a schema such as this mobEventDF具有这样的模式
root
|-- appEvent: struct (nullable = true)
| |-- Parm1: string (nullable = true)
| |-- Parm2: string (nullable = true)
| |-- Parm3: string (nullable = true)
Is there a better way to do this? 有一个更好的方法吗? How can I map this into a Scala Case class like the one below directly? 我如何将其直接映射到Scala Case类(如下面的类)?
case class ME(name: String,
factory: String,
delay: Timestamp)
选择并重命名所有字段,然后as
方法调用
kafkaDF.select($"mobEvent.*").toDF("name", "factory", "delay").as[ME]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.