When defining an UDT in SparkSQL, I make a UDT like this
class trajUDT extends UserDefinedType[traj] {
override def sqlType: DataType = StructType(Seq(
StructField("id", DataTypes.StringType),
StructField("loc", ArrayType(StructType(Seq(
StructField("x",DataTypes.DoubleType),
StructField("y",DataTypes.DoubleType)
))))
))
...
}
where traj is a Class
class traj(val id:UTF8String,val loc:Array[Tuple2[Double,Double]] )
and I want to write a serialize funtion like this
override def serialize(p: traj): GenericInternalRow = {
new GenericInternalRow(Array[Any](p.id,p.loc.map(x=>Array(x._1,x._2)))
}
But it failed as it told me that this cannot be convert to a ArrayData.
I also write a deserialize function like this:
override def deserialize(datum: Any): traj = {
val arr=datum.asInstanceOf[InternalRow]
val id = arr.getUTF8String(0)
val xytype=StructType(Seq(
StructField("x",DataTypes.DoubleType),
StructField("y",DataTypes.DoubleType)
))
val xy = arr.getArray(1)
val xye =xy.toArray[Tuple2[Double,Double]](xytype)
new traj(id,xye)
}
And I guess it could also not work...
So can someone teach me how to do these two conversion?
I faced a similar problem while working with InternalRow
Constructing an InternalRow
with an Array
or Seq
leads to java.lang.ClassCastException .
import org.apache.spark.sql.catalyst.InternalRow
val row = InternalRow(Array(1, 2, 3), 1L)
println(s"Row first element: ${row.getArray(0).toIntArray.toVector}")
println(s"Row second element: ${row.getLong(1)}")
java.lang.ClassCastException: [I cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getArray(rows.scala:48)
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:195)
I solved this by passing an ArrayData
field instead of Array
or Seq
. I used the ArrayData.toArrayData
method as follows:
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.util.ArrayData
val row = InternalRow(ArrayData.toArrayData(Array(1, 2, 3)), 1L)
println(s"Row first element: ${row.getArray(0).toIntArray.toVector}")
println(s"Row second element: ${row.getLong(1)}")
Row first element: Vector(1, 2, 3)
Row second element: 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.