[英]Why is the spark.implicits._ import not helping with encoder derivation inside a method?
So, importing an implicit member from a created instance works as expected,因此,从创建的实例中导入隐式成员按预期工作,
object Test extends App {
class Bag {
implicit val ssss: String = "omg"
}
def call(): Unit = {
val bag = new Bag
import bag._
val s = implicitly[String]
println(s)
}
call()
}
But, if I try doing the same with spark.implicits._
但是,如果我尝试对spark.implicits._
做同样的事情
object Test extends App {
val spark: SparkSession = ...
def call(): Unit = {
import spark.implicits._
case class Person(id: Long, name: String)
// I can summon an existing encoder
// val enc = implicitly[Encoder[Long]]
// but encoder derivation is failing for some reason
// val encP = implicitly[Encoder[Person]]
val df: Dataset[Person] =
spark.range(10).map(i => Person(i, i.toString))
df.show()
}
}
It fails to derive the Encoder[Person]
,它无法派生Encoder[Person]
,
Unable to find encoder for type Person. An implicit Encoder[Person] is needed to store Person instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
.map(i => Person(i, i.toString)
But, it works if I create the dataframe outside the method,但是,如果我在方法之外创建 dataframe,它就会起作用,
object Test extends App {
val spark: SparkSession = ...
import spark.implicits._
case class Person(id: Long, name: String)
val df: Dataset[Person] =
spark.range(10).map(i => Person(i, i.toString))
df.show()
}
Tested with Scala version 2.13.10
and 2.12.17
with Spark version 3.3.1
.使用 Scala 版本2.13.10
和2.12.17
与 Spark 版本3.3.1
进行测试。
The local case class
is the reason for provided behaviour.本地case class
是提供行为的原因。 Local class has so called free type and more about that you can check here .本地 class 有所谓的免费类型,您可以在此处查看更多相关信息。 You may try to experiment adding TypeTag
for Person
in local scope to see it it may help.您可以尝试在本地 scope 中为Person
添加TypeTag
,看看它是否有帮助。
As you already found out yourself, local Person
doesn't have TypeTag
.正如您自己发现的那样,本地Person
没有TypeTag
。 But it has WeakTypeTag
(and ClassTag
).但它有WeakTypeTag
(和ClassTag
)。 Let's try to define Encoder
for such class.让我们尝试为这样的 class 定义Encoder
。
Naive approach with constructing TypeTag
doesn't work构造TypeTag
的天真方法不起作用
How to create a TypeTag manually? 如何手动创建 TypeTag?
In scala 2.12, why none of the TypeTag created in runtime is serializable? scala 2.12,为什么运行时创建的TypeTag都不是可序列化的?
Scala Spark Encoders.product[X] (where X is a case class) keeps giving me "No TypeTag available for X" error Scala Spark Encoders.product[X](其中 X 是案例类)一直给我“没有可用于 X 的 TypeTag”错误
Spark: DF.as[Type] fails to compile Spark:DF.as[Type] 编译失败
implicit def ttag[A: WeakTypeTag]: TypeTag[A] = {
val ttag = null // hiding implicit by name
val wttagImpl = weakTypeTag[A].asInstanceOf[WeakTypeTag[A] {val mirror: Mirror; val tpec: TypeCreator}]
TypeTag[A](wttagImpl.mirror, wttagImpl.tpec)
}
java.lang.NoClassDefFoundError: no Java class corresponding to Person found
https://gist.github.com/DmytroMitin/41b7439d2e504e37f29b02e3500d24b1 https://gist.github.com/DmytroMitin/41b7439d2e504e37f29b02e3500d24b1
Similar results is for类似的结果是
def typeToTypeTag[T](
tpe: Type,
mirror: api.Mirror[universe.type]
): TypeTag[T] = {
TypeTag(mirror, new TypeCreator {
def apply[U <: api.Universe with Singleton](m: api.Mirror[U]) = {
assert(m eq mirror, s"TypeTag[$tpe] defined in $mirror cannot be migrated to $m.")
tpe.asInstanceOf[U#Type]
}
})
}
implicit def ttag[T: WeakTypeTag]: TypeTag[T] = {
val ttag = null
typeToTypeTag(weakTypeOf[T], mirror)
}
java.lang.NoClassDefFoundError: no Java class corresponding to Person found
https://gist.github.com/DmytroMitin/c7a24abf1ff1011a1c87aa9d161d6395 https://gist.github.com/DmytroMitin/c7a24abf1ff1011a1c87aa9d161d6395
implicit val personTtag: TypeTag[Person] = {
val personTtag = null
tb.eval(q"org.apache.spark.sql.catalyst.ScalaReflection.universe.typeTag[${weakTypeOf[Person]}]")
.asInstanceOf[TypeTag[Person]]
}
scala.tools.reflect.ToolBoxError: reflective toolbox failed due to unresolved free type variables
https://gist.github.com/DmytroMitin/6e35c0332f845fcd227d35ec49d4122f https://gist.github.com/DmytroMitin/6e35c0332f845fcd227d35ec49d4122f
This is how Encoder[T]
is defined for T
having TypeTag
这就是为具有TypeTag
的T
定义Encoder[T]
的方式
implicit def newProductEncoder[T <: Product : TypeTag]: Encoder[T] = Encoders.product[T]
object Encoders {
def product[T <: Product : TypeTag]: Encoder[T] = ExpressionEncoder()
}
object ExpressionEncoder {
def apply[T : TypeTag](): ExpressionEncoder[T] = {
val mirror = ScalaReflection.mirror
val tpe = typeTag[T].in(mirror).tpe
val cls = mirror.runtimeClass(tpe)
val serializer = ScalaReflection.serializerForType(tpe)
val deserializer = ScalaReflection.deserializerForType(tpe)
new ExpressionEncoder[T](
serializer,
deserializer,
ClassTag[T](cls)
)
}
}
Let's try to modify it for T
having WeakTypeTag
and ClassTag
让我们尝试为具有WeakTypeTag
和ClassTag
的T
修改它
implicit def apply[T: WeakTypeTag /*: ClassTag*/]: Encoder[T] = {
val tpe = weakTypeTag[T].in(mirror).tpe
val cls = mirror.runtimeClass(tpe)
val serializer = ScalaReflection.serializerForType(tpe)
val deserializer = ScalaReflection.deserializerForType(tpe)
new ExpressionEncoder[T](
serializer,
deserializer,
ClassTag[T](cls)
)
}
java.lang.NoClassDefFoundError: no Java class corresponding to Person found
https://gist.github.com/DmytroMitin/b58848fa6575b6fab0e9b8285095cc60 https://gist.github.com/DmytroMitin/b58848fa6575b6fab0e9b8285095cc60
// (*)
implicit def apply[T/*: WeakTypeTag*/ : ClassTag]: Encoder[T] = {
val tpe = mirror.classSymbol(classTag[T].runtimeClass).toType
val serializer = ScalaReflection.serializerForType(tpe)
val deserializer = ScalaReflection.deserializerForType(tpe)
new ExpressionEncoder[T](
serializer,
deserializer,
classTag[T]
)
}
org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: Main
https://gist.github.com/DmytroMitin/0c86933f96e136d44fff555295ce01dd https://gist.github.com/DmytroMitin/0c86933f96e136d44fff555295ce01dd
So finally let's make Main
extend Serializable
所以最后让我们让Main
扩展Serializable
+---+----+
| id|name|
+---+----+
| 0| 0|
| 1| 1|
| 2| 2|
| 3| 3|
| 4| 4|
| 5| 5|
| 6| 6|
| 7| 7|
| 8| 8|
| 9| 9|
+---+----+
https://gist.github.com/DmytroMitin/0e9b0bd2ed6237a4a1e1c40d620a9d88 https://gist.github.com/DmytroMitin/0e9b0bd2ed6237a4a1e1c40d620a9d88
So (*) is correct Encoder
.所以 (*) 是正确的Encoder
。
This doesn't seem to work with generic local Person
这似乎不适用于通用的Person
case class Person[T](id: Long, name: String, t: T)
java.lang.UnsupportedOperationException: No Encoder found for Person$1
https://gist.github.com/DmytroMitin/69496ce257fc9a3a7a5fbd004c52dcc0 https://gist.github.com/DmytroMitin/69496ce257fc9a3a7a5fbd004c52dcc0
scala.ScalaReflectionException: free type Person is not a class
https://gist.github.com/DmytroMitin/07bfe954dca677f0a39c06779b94280e https://gist.github.com/DmytroMitin/07bfe954dca677f0a39c06779b94280e
For generic local class the encoder should be (using both WeakTypeTag
and ClassTag
)对于通用本地 class,编码器应该是(同时使用WeakTypeTag
和ClassTag
)
implicit def apply[T: WeakTypeTag : ClassTag]: Encoder[T] = {
val tpe0 = weakTypeTag[T].in(mirror).tpe
val typeArgs = tpe0/*.dealias*/.typeArgs
val tpe = mirror.classSymbol(classTag[T].runtimeClass).toType
val tpe1 = appliedType(tpe.typeConstructor, typeArgs)
val serializer = ScalaReflection.serializerForType(tpe1)
val deserializer = ScalaReflection.deserializerForType(tpe1)
new ExpressionEncoder[T](
serializer,
deserializer,
classTag[T]
)
}
https://gist.github.com/DmytroMitin/08c8f21ffb1427bfa15dd21fbdfb77fa https://gist.github.com/DmytroMitin/08c8f21ffb1427bfa15dd21fbdfb77fa
Well, now this doesn't work for a generic local class with type parameter that is a generic local class好吧,现在这不适用于类型参数为通用本地 class 的通用本地 class
val df: Dataset[Person[Person[Int]]] =
spark.range(10).map(i => Person(i, i.toString, Person(i, i.toString, i.toInt)))
scala.ScalaReflectionException: free type Person is not a class
https://gist.github.com/DmytroMitin/5bceb2b81f2391c5c312a045edb827a8 https://gist.github.com/DmytroMitin/5bceb2b81f2391c5c312a045edb827a8
Improved version of codec:编解码器的改进版本:
case class Application(tycon: ClassTag[_], targs: List[Application])
class DeepClassTag[T](val classTags: Application)
object DeepClassTag {
def apply[T: DeepClassTag]: DeepClassTag[T] = implicitly[DeepClassTag[T]]
implicit def deepClassTag0[A: ClassTag]: DeepClassTag[A] =
new DeepClassTag(Application(classTag[A], List()))
implicit def deepClassTag11[A[_], B1](implicit tycon: ClassTag[A[_]], dct1: DeepClassTag[B1]): DeepClassTag[A[B1]] =
new DeepClassTag(Application(tycon, List(dct1.classTags)))
implicit def deepClassTag12[A[_,_], B1, B2](implicit tycon: ClassTag[A[_,_]], dct1: DeepClassTag[B1], dct2: DeepClassTag[B1]): DeepClassTag[A[B1, B2]] =
new DeepClassTag(Application(tycon, List(dct1.classTags, dct2.classTags)))
// ...
implicit def deepClassTag2[A[_[_]], B1[_]](implicit tycon: ClassTag[A[B1]], dct1: DeepClassTag[B1[_]]): DeepClassTag[A[B1]] =
new DeepClassTag(Application(tycon, List(dct1.classTags)))
// ...
}
def improveStaticType[T: WeakTypeTag : DeepClassTag]: Type =
improveDynamicType(weakTypeOf[T], DeepClassTag[T].classTags)
def improveDynamicType(tpe: Type, classTags: Application): Type = {
val newTycon = improveFreeType(tpe, classTags.tycon.runtimeClass)
val targs = tpe.dealias.typeArgs
assert(targs.length == classTags.targs.length, s"( $targs ).length == ( ${classTags.targs} ).length")
val newArgs = targs.zip(classTags.targs).map((improveDynamicType _).tupled)
appliedType(newTycon, newArgs)
}
def improveFreeType(tpe: Type, cls: Class[_]): Type =
if (internal.isFreeType(tpe.typeSymbol)) {
val typeArgs = tpe.dealias.typeArgs
val typeConstructor = mirror.classSymbol(cls).toType.typeConstructor
appliedType(typeConstructor, typeArgs)
} else tpe
implicit def enc[T: WeakTypeTag : ClassTag : DeepClassTag]: Encoder[T] = {
val tpe = improveStaticType[T]
val serializer = ScalaReflection.serializerForType(tpe)
val deserializer = ScalaReflection.deserializerForType(tpe)
new ExpressionEncoder[T](
serializer,
deserializer,
classTag[T]
)
}
https://gist.github.com/DmytroMitin/56044515e031fcf1e977ab213013861d https://gist.github.com/DmytroMitin/56044515e031fcf1e977ab213013861d
DeepClassTag
seems not to work with higher-kinded classes DeepClassTag
似乎不适用于更高种类的课程
https://gist.github.com/DmytroMitin/6388a437507e8389f30230e08382d9ff https://gist.github.com/DmytroMitin/6388a437507e8389f30230e08382d9ff
Improved version but still not always working (there are too many shapes of type constructors)改进后的版本,但仍然不能正常工作(类型构造函数的形状太多)
https://gist.github.com/DmytroMitin/2625ee20695404c6fc118ab8680808f2 https://gist.github.com/DmytroMitin/2625ee20695404c6fc118ab8680808f2
Instead of manual definition of type-class instances for different shapes of type constructors, the type class DeepClassTag
can be defined with macros as follows可以使用宏定义类型 class DeepClassTag
,而不是为不同形状的类型构造函数手动定义类型类实例,如下所示
import scala.language.experimental.macros
import scala.reflect.ClassTag
import scala.reflect.macros.whitebox
case class Application(tycon: ClassTag[_], targs: List[Application])
class DeepClassTag[T](val classTags: Application)
object DeepClassTag {
def apply[T: DeepClassTag]: DeepClassTag[T] = implicitly[DeepClassTag[T]]
implicit def mkDeepClassTag[T]/*(implicit tCtag: ClassTag[T])*/: DeepClassTag[T] =
macro DeepClassTagMacros.mkDeepClassTagImpl[T]
}
class DeepClassTagMacros(val c: whitebox.Context) {
import c.universe._
def findInstance[TC[_]](tpe: Type)(implicit wttag: WeakTypeTag[TC[_]]): Tree =
c.inferImplicitValue(
appliedType(weakTypeOf[TC[_]].typeConstructor, tpe),
silent = false
)
def mkDeepClassTagImpl[T: WeakTypeTag]/*(tCtag: c.Tree)*/ : Tree = {
val T = weakTypeOf[T]
val tCtag = findInstance[ClassTag](T)
val targCtags = T.dealias.typeArgs.map(arg => {
val argInst = findInstance[DeepClassTag](arg)
q"$argInst.classTags"
})
val targClassTags = q"_root_.scala.List.apply[Application](..$targCtags)"
q"new DeepClassTag[$T](Application($tCtag, $targClassTags))"
}
}
(Is it working?) (有效果吗?)
My PR to Spark to support local classes: https://github.com/apache/spark/pull/38740我对 Spark 的 PR 以支持本地类: https://github.com/apache/spark/pull/38740
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.