简体   繁体   English

scala如何参数化案例类,并将案例类变量传递给[T <:Product:TypeTag]

[英]scala how to parameterized case class, and pass the case class variable to [T <: Product: TypeTag]

// class definition of RsGoods schema
case class RsGoods(add_time: Int)

// my operation
originRDD.toDF[Schemas.RsGoods]()

// and the function definition
def toDF[T <: Product: TypeTag](): DataFrame = mongoSpark.toDF[T]()

now i defined too many schemas(RsGoods1,RsGoods2,RsGoods3), and more will be added in the future. 现在我定义了太多的架构(RsGoods1,RsGoods2,RsGoods3),将来还会添加更多。

so the question is how to pass a case class as a variable to structure the code 所以问题是如何将案例类作为变量传递以构造代码

Attach sbt dependency 附加sbt依赖

  "org.apache.spark" % "spark-core_2.11" % "2.3.0",
  "org.apache.spark" %% "spark-sql" % "2.3.0",
  "org.mongodb.spark" %% "mongo-spark-connector" % "2.3.1",

Attach the key code snippet 附上关键代码片段

  var originRDD = MongoSpark.load(sc, readConfig)
  val df = table match {
    case "rs_goods_multi" => originRDD.toDF[Schemas.RsGoodsMulti]()
    case "rs_goods" => originRDD.toDF[Schemas.RsGoods]()
    case "ma_item_price" => originRDD.toDF[Schemas.MaItemPrice]()
    case "ma_siteuid" => originRDD.toDF[Schemas.MaSiteuid]()
    case "pi_attribute" => originRDD.toDF[Schemas.PiAttribute]()
    case "pi_attribute_name" => originRDD.toDF[Schemas.PiAttributeName]()
    case "pi_attribute_value" => originRDD.toDF[Schemas.PiAttributeValue]()
    case "pi_attribute_value_name" => originRDD.toDF[Schemas.PiAttributeValueName]()

From what I have understood about your requirement, i think following should be a decent starting point. 根据我对您的要求的了解,我认为跟随应该是一个不错的起点。

def readDataset[A: Encoder](
  spark: SparkSession,
  mongoUrl: String,
  collectionName: String,
  clazz: Class[A]
): Dataset[A] = {
  val config = ReadConfig(
    Map("uri" -> s"$mongoUrl.$collectionName")
  )

  val df = MongoSpark.load(spark, config)

  val fieldNames = clazz.getDeclaredFields.map(f => f.getName).dropRight(1).toList

  val dfWithMatchingFieldNames = df.toDf(fieldNames: _*)

  dfWithMatchingFieldNames.as[A]
}

You can use it like this, 你可以这样使用

case class RsGoods(add_time: Int)

val spark: SparkSession = ...

import spark.implicts._

val rdGoodsDS = readDataset[RsGoods](
  spark,
  "mongodb://example.com/database",
  "rs_goods",
  classOf[RsGoods]
)

Also, the following two lines, 另外,以下两行

val fieldNames = clazz.getDeclaredFields.map(f => f.getName).dropRight(1).toList

val dfWithMatchingFieldNames = df.toDf(fieldNames: _*)

are only required because normally Spark reads DataFrames with column names like value1, value2, ... . 仅因为正常情况下Spark读取具有诸如value1, value2, ...类的列名称的DataFrames才需要。 So we want to change the column names to match what we have in our case class . 因此,我们想更改列名以匹配case class的列名。

I am not sure what these "defalut" column names will be because MongoSpark is involved. 我不确定这些“ defalut”列的名称是什么,因为涉及到MongoSpark。

You should first check the column names in the df created as following, 您首先应检查按以下方式创建的df中的列名称,

val config = ReadConfig(
  Map("uri" -> s"$mongoUrl.$collectionName")
)

val df = MongoSpark.load(spark, config)

If, MongoSpark fixes the problem of these "default" column names and picks the coulmn names from your collection then those 2 lines will not be required and your method will become just this, 如果MongoSpark解决了这些“默认”列名称的问题并从您的集合中选择了库伦名称,那么将不需要这两行,并且您的方法将变成这样,

def readDataset[A: Encoder](
  spark: SparkSession,
  mongoUrl: String,
  collectionName: String,
): Dataset[A] = {
  val config = ReadConfig(
    Map("uri" -> s"$mongoUrl.$collectionName")
  )

  val df = MongoSpark.load(spark, config)

  df.as[A]
}

And, 和,

val rsGoodsDS = readDataset[RsGoods](
  spark,
  "mongodb://example.com/database",
  "rs_goods"
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 没有 TypeTag 可用于案例 class 使用 scala 3 和火花 3 - No TypeTag available for a case class using scala 3 with spark 3 没有适用于案例类别Type的TypeTag - No TypeTag available for case class Type Scala-没有TypeTag可用当使用案例类尝试获取TypeTag时发生异常吗? - Scala - No TypeTag Available Exception when using case class to try to get TypeTag? Scala Spark Encoders.product[X](其中 X 是一个案例类)不断给我“No TypeTag available for X”错误 - Scala Spark Encoders.product[X] (where X is a case class) keeps giving me "No TypeTag available for X" error 将scala(2.8)case类中的可变数量的参数传递给父构造函数 - pass variable number of arguments in scala (2.8) case class to parent constructor 如何将案例 class 作为变量传递给 ScalaReflection - How to pass case class as a variable into ScalaReflection Scala 案例类使用可序列化扩展产品 - Scala case class extending Product with Serializable 如何 Map 与 Scala 中的案例 class - How to Map with a case class in Scala 在Scala中,如何获取类方法的返回的TypeTag? - In Scala, How to get the returned TypeTag of a class method? 用杰克逊在Scala中对其中包含变量的案例类进行Serlaize - Serlaize case class with the variable inside it in scala with jackson
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM