[英]Spark Java Encoders.bean fail to convert to a Scala defined class
I have Java code to convert a JavaRDD
to Dataset
and save it to HDFS:我有 Java 代码将
JavaRDD
转换为Dataset
并将其保存到 HDFS:
Dataset<User> userDataset = sqlContext.createDataset(userRdd.rdd(), Encoders.bean(User.class));
userDataset.write.json("some_path");
User
class is defined in Scala language: User
class 在 Scala 语言中定义:
case class User(val name: Name, val address: Seq[Address]) extends Serializable
case class Name(firstName: String, lastName: Option[String])
case class Address(address: String)
Code complies and runs successfully, file is saved to HDFS, while User
class in the output file has empty schema:代码编译运行成功,文件保存到 HDFS,而 output 文件中的
User
class 的架构为空:
val users = spark.read.json("some_path")
users.count // 100,000 which is same as "userRdd"
users.printSchema // users: org.apache.spark.sql.DataFrame = []
Why Encoders.bean
is not working in this case?为什么
Encoders.bean
在这种情况下不起作用?
Encoders.bean
does not support Scala case class, Encoders.product
supports that. Encoders.bean
不支持 Scala 案例 class, Encoders.product
支持。 Encoders.product
takes a TypeTag
as parameter while initializing a TypeTag
is not possible in Java. Encoders.product
将TypeTag
作为参数,而在TypeTag
中无法初始化 TypeTag。 I created a Scala object to provide TypeTag
:我创建了一个 Scala object 来提供
TypeTag
:
import scala.reflect.runtime.universe._
object MyTypeTags {
val UserTypeTag: TypeTag[User] = typeTag[User]
}
Then in Java code: Dataset<User> userDataset = sqlContext.createDataset(userRdd.rdd(), Encoders.product(MyTypeTags.UserTypeTag()));
然后在 Java 代码:
Dataset<User> userDataset = sqlContext.createDataset(userRdd.rdd(), Encoders.product(MyTypeTags.UserTypeTag()));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.