When importing data from a MS SQL database, there is the potential for null values. In Spark, DataFrames are able to handle the null values. But when I try to convert the DataFrame to a strongly typed Dataset, I receive encoder errors.
Here's a simple example:
case class optionTest(var a: Option[Int], var b: Option[Int])
object testObject {
def main(args: Array[String]): Unit = {
import spark.implicits._
val df = spark.sparkContext.parallelize(Seq(input)).toDF()
val df2 = Seq((1, 3), (3, Option(null)))
.toDF("a", "b")
.as[optionTest]
df2.show()
}
}
Here is the error for this case:
No Encoder found for Any
- field (class: "java.lang.Object", name: "_2")
- root class: "scala.Tuple2"
java.lang.UnsupportedOperationException: No Encoder found for Any
- field (class: "java.lang.Object", name: "_2")
- root class: "scala.Tuple2"
What is the recommended approach to handle nullable values when creating a Dataset from a DataFrame?
The problem is that your Dataframe doesn't match your case class.
Your first pair is an (Int, Int)
, and your second is an (Int, Option[Null])
.
The easy thing to note is that if you want to represent an Option[Int]
, the value will be either Some(3)
, for example, or None
for an absent value.
The tricky thing to note is that in Scala Int
is a subclass of AnyVal
while nullable references, which should be almost nonexistent in the Scala code you write, are on the AnyRef
side of the Scala object hierarchy.
Because you have a bunch of objects that are all over the Scala object model, Spark has to treat your data as Any
, the superclass of everything. There is no encoder that can handle that.
So with all that said, your data would have to look like this:
val df2 = Seq((Some(1), Some(3)), (Some(3), None))
As a side note, your case class should look like this:
case class OptionTest(a: Option[Int], b: Option[Int])
If you wan to use Option
you have to use it for all records. You should also use None
instead of Option(null)
:
Seq((1, Some(3)), (3, None)).toDF("a", "b").as[optionTest]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.