简体   繁体   English

Spark Java Encoders.bean 无法转换为 Scala 定义的 class

[英]Spark Java Encoders.bean fail to convert to a Scala defined class

I have Java code to convert a JavaRDD to Dataset and save it to HDFS:我有 Java 代码将JavaRDD转换为Dataset并将其保存到 HDFS:

Dataset<User> userDataset = sqlContext.createDataset(userRdd.rdd(), Encoders.bean(User.class));
userDataset.write.json("some_path");

User class is defined in Scala language: User class 在 Scala 语言中定义:

case class User(val name: Name, val address: Seq[Address]) extends Serializable

case class Name(firstName: String, lastName: Option[String])

case class Address(address: String)

Code complies and runs successfully, file is saved to HDFS, while User class in the output file has empty schema:代码编译运行成功,文件保存到 HDFS,而 output 文件中的User class 的架构为空:

val users = spark.read.json("some_path")
users.count // 100,000 which is same as "userRdd"
users.printSchema // users: org.apache.spark.sql.DataFrame = []

Why Encoders.bean is not working in this case?为什么Encoders.bean在这种情况下不起作用?

Encoders.bean does not support Scala case class, Encoders.product supports that. Encoders.bean不支持 Scala 案例 class, Encoders.product支持。 Encoders.product takes a TypeTag as parameter while initializing a TypeTag is not possible in Java. Encoders.productTypeTag作为参数,而在TypeTag中无法初始化 TypeTag。 I created a Scala object to provide TypeTag :我创建了一个 Scala object 来提供TypeTag

import scala.reflect.runtime.universe._

object MyTypeTags {
  val UserTypeTag: TypeTag[User] = typeTag[User]
}

Then in Java code: Dataset<User> userDataset = sqlContext.createDataset(userRdd.rdd(), Encoders.product(MyTypeTags.UserTypeTag()));然后在 Java 代码: Dataset<User> userDataset = sqlContext.createDataset(userRdd.rdd(), Encoders.product(MyTypeTags.UserTypeTag()));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM