简体   繁体   English

Spark Scala数据集类型层次结构

[英]Spark Scala Dataset Type Hierarchy

Trying to enforce classes that extend W to have a method get that returns a Dataset of a subclass of a WR. 尝试执行扩展W的类以具有返回WR子类的数据集的方法get的方法。

abstract class WR

case class TGWR(
          a: String,
          b: String
        ) extends WR

abstract class W {

  def get[T <: WR](): Dataset[T]

}


class TGW(sparkSession: SparkSession) extends W {

  override def get[TGWR](): Dataset[TGWR] = {
    import sparkSession.implicits._

    Seq(TGWR("dd","dd").toDF().as[TGWR]
  }

}

Compilation error: 编译错误:

Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.

If I change the get function to following: 如果我将get函数更改为以下内容:

  def get(): Dataset[TGWR]

and

  override def get(): Dataset[TGWR] = {...

it compiles - therefore I suspect a problem due to inheritance/type hierarchy. 它可以编译-因此我怀疑由于继承/类型层次结构而引起的问题。

Forget my comment, I re-read your question and noticed a simple problem. 忘了我的评论,我重新阅读了您的问题,发现一个简单的问题。

Here override def get[TGWR] you are not saying that this class produces instances of TGWR , but you are creating a new type parameter of name TGWR , that will shadow your real type. 在这里override def get[TGWR]并不是说该类会生成TGWR实例,而是要创建一个名称为TGWR的新类型参数 ,该参数TGWR您的真实类型。
I fixed it with the following code: 我用以下代码修复了它:

import org.apache.spark.sql.{SparkSession, Dataset}

abstract class WR extends Product with Serializable

final case class TGWR(a: String, b: String) extends WR

abstract class W[T <: WR] {
  def get(): Dataset[T]
}

final class TGW(spark: SparkSession) extends W[TGWR] {
  override def get(): Dataset[TGWR] = {
    import spark.implicits._
    Seq(TGWR("dd","dd")).toDF().as[TGWR]
  }
}

That you can use right this: 您可以使用此:

val spark = SparkSession.builder.master("local[*]").getOrCreate()
(new TGW(spark)).get()
// res1: org.apache.spark.sql.Dataset[TGWR] = [a: string, b: string]
res1.show()
// +---+---+
// |  a|  b|
// +---+---+
// | dd| dd|
// +---+---+

Hope this is what you are looking for. 希望这是您想要的。
Do not doubt to ask for clarification. 毫无疑问地要求澄清。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM