简体   繁体   中英

What is TypeTag in Encoders.product?

I use Spark 2.1.1.

I started off with the following:

import org.apache.spark.sql.types._
val mySchema = StructType(
  StructField("id", IntegerType, true),
  StructField("code", StringType, false),
  StructField("value", DecimalType, false))
val myDS = Seq((1,"000010", 1.0), (2, "000020", 2.0)).as[mySchema]

Here I saw that mySchema was not a type and after looking at Encoders.scala I could see I needed to pass a subtype of Product here via

def product[T <: Product : TypeTag]: Encoder[T] = ExpressionEncoder()

So after seeing that the colon operator is just syntactical sugar for an implicit parameter from What are Scala context and view bounds? , I can see that there should be an implicit TypeTag[T] available but I don't understand though how TypeTag[T] is implicit from looking at SQLImplicits.scala .

   /**
   * @since 1.6.1
   * @deprecated use [[newSequenceEncoder]]
   */
  def newProductSeqEncoder[A <: Product : TypeTag]: Encoder[Seq[A]] = ExpressionEncoder() 

Even though it's deprecated, when I look at

 /** @since 2.2.0 */
  implicit def newSequenceEncoder[T <: Seq[_] : TypeTag]: Encoder[T] = ExpressionEncoder()

I still wonder where is there a TypeTag[T] implicitly declared?

TypeTag is a type class that will implicitly load an instance for any type you try to summon. This is independent from Spark or SQLImplicits , for example you can try this

def getMyTypeTag[T : TypeTag]: TypeTag[T] = implicitly[TypeTag[T]]

On the other hand a spark sql Encoder can be built by spark as soon as you import the implicits defined in SqlImplicits , if you take a look to LowPrioritySQLImplicits you can see that you need the TypeTag to create the Encoder for Product (case classes), that's why you need to load the TypeTag in the implicit context

trait LowPrioritySQLImplicits {
  /** @since 1.6.0 */
  implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T]

}

The TypeTag can be summoned only if the code from where you are trying to summon the Encoder is not generic or the TypeTag is already in the context. For Example

def loadEncoder(): Encoder[MyType] ={
    import spark.implicits._
    Encoder[MyType] // The type is here so it will work
}

on the other hand

loadEncoder[MyType]
def loadEncoder[T](): Encoder[T] ={
    import spark.implicits._
    Encoder[T] // The type info is not here so it wont work
}

and

loadEncoder[MyType]
def loadEncoder[T: TypeTag](): Encoder[T] ={
    import spark.implicits._
    Encoder[T] // The type info is not here but the TypeTag is so it will work
}

Ok I thought it was a Spark thing, but there is an import statement at the top of the page

import scala.reflect.runtime.universe.TypeTag

When I look at the API page http://www.scala-lang.org/api/2.11.6/scala-reflect/index.html#scala.reflect.api.TypeTags I can see it's being handled here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM