如何使用在Scala中创建数据集的通用案例类实现特征

Question

I want to create a Scala trait that should be implemented with a case class T. The trait is simply to load data and transform it into a Spark Dataset of type T. I got the error that no encoder can be stored, which I think is because Scala does not know that T should be a case class. 我想创建一个应该用案例类T实现的Scala特征。该特征只是加载数据并将其转换为类型T的Spark数据集。我得到一个错误，即无法存储任何编码器，我认为这是因为Scala不知道T应该是案例类。 How can I tell the compiler that? 我怎样才能告诉编译器呢？ I've seen somewhere that I should mention Product, but there is no such class defined.. Feel free to suggest other ways to do this! 我见过某个地方应该提到Product，但是还没有定义此类。.随意建议其他方法！

I have the following code but it is not compiling with the error: 42: error: Unable to find encoder for type stored in a Dataset. 我有以下代码，但未编译该错误：42：错误：找不到用于存储在数据集中的类型的编码器。 Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._ [INFO] .as[T] 通过导入sqlContext.implicits._ [INFO] .as [T]，支持基本类型（Int，String等）和产品类型（案例类）。

I'm using Spark 1.6.1 我正在使用Spark 1.6.1

Code: 码：

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{Dataset, SQLContext}    

/**
      * A trait that moves data on Hadoop with Spark based on the location and the granularity of the data.
      */
    trait Agent[T] {
      /**
        * Load a Dataframe from the location and convert into a Dataset
        * @return Dataset[T]
        */
      protected def load(): Dataset[T] = {
        // Read in the data
        SparkContextKeeper.sqlContext.read
          .format("com.databricks.spark.csv")
          .load("/myfolder/" + location + "/2016/10/01/")
          .as[T]
      }
    }

Answer 1

Your code is missing 3 things: 您的代码缺少3件事：

Indeed, you must let compiler know that T is subclass of Product (the superclass of all Scala case classes and Tuples) 确实，您必须让编译器知道T是Product子类（所有Scala case类和Tuples的超类）
Compiler would also require the TypeTag and ClassTag of the actual case class. 编译器还将需要实际案例类的TypeTag和ClassTag 。 This is used implicitly by Spark to overcome type erasure Spark隐式使用它来克服类型擦除
import of sqlContext.implicits._ 导入sqlContext.implicits._

Unfortunately, you can't add type parameters with context bounds in a trait , so the simplest workaround would be to use an abstract class instead: 不幸的是，您不能在trait中添加带有上下文边界的类型参数，因此最简单的解决方法是改为使用abstract class ：

import scala.reflect.runtime.universe.TypeTag
import scala.reflect.ClassTag

abstract class Agent[T <: Product : ClassTag : TypeTag] {
  protected def load(): Dataset[T] = { 
    val sqlContext: SQLContext = SparkContextKeeper.sqlContext
    import sqlContext.implicits._
    sqlContext.read.// same... 
  }
}

Obviously, this isn't equivalent to using a trait, and might suggest that this design isn't the best fit for the job. 显然，这并不等同于使用特征，并且可能表明该设计不是最适合这项工作的。 Another alternative is placing load in an object and moving the type parameter to the method: 另一种选择是将load放置在对象中并将类型参数移至方法：

object Agent {
  protected def load[T <: Product : ClassTag : TypeTag](): Dataset[T] = {
    // same...
  }
}

Which one is preferable is mostly up to where and how you're going to call load and what you're planning to do with the result. 哪种方法更可取，主要取决于您将在何处以及如何调用load以及您打算如何处理结果。

Answer 2

You need to take two actions : 您需要执行以下两项操作：

Add import sparkSession.implicits._ in your imports 在import sparkSession.implicits._中添加import sparkSession.implicits._
Make your trait trait Agent[T <: Product] 使您的特质成为trait Agent[T <: Product]

如何使用在Scala中创建数据集的通用案例类实现特征

问题描述

2 个解决方案

解决方案1
5 已采纳 2016-11-10 16:03:31

解决方案2
0 2016-11-10 16:01:28

如何使用在Scala中创建数据集的通用案例类实现特征

问题描述

2 个解决方案

解决方案1 5 已采纳 2016-11-10 16:03:31

解决方案2 0 2016-11-10 16:01:28

解决方案1
5 已采纳 2016-11-10 16:03:31

解决方案2
0 2016-11-10 16:01:28