简体   繁体   中英

Apache Spark Java - how to get a TypedColumn when the type is an array of objects?

I'm trying to add a new column to my data frame according to an existing column, in which the data is an array of custom objects. Suppose the object type is MyObject, I'm trying to do something like:

Column col = df.col("old_col");
Encoder<MyObject[]> encoder = Encoders.bean(MyObject[].class);
TypedColumn<Object, MyObject[]> typedColumn = col.as(encoder);
df = df.withColumn("new_col",functions.callUDF("my_udf", typedColumn));

And I receive the following exception:

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:156) ~[scala-library-2.11.8.jar:?]
    at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:87) ~[spark-catalyst_2.11-2.2.0.jar:2.2.0]
    at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142) ~[spark-catalyst_2.11-2.2.0.jar:2.2.0]

Because the type is required to be StructType and it is ArrayType.

How can I get the typed object to work with in my UDF?

The solution I used eventually was using functions.to_json function to send the column values as JSON strings, and inside the UDF deserializing the object.

Calling the UDF looks like:

df = df.withColumn("new_col",functions.callUDF("my_udf",functions.to_json(df.col("old_col"))));

And in the UDF definition it looks like:

MyObject[] objArray = new Gson().fromJson(jsonStr, MyObject[].class);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM