[英]Scala UDF with collect_list task not serializable
我正在尝试在 udf 字段上使用 collect_list 。 下面是我的代码。 如果我不使用 UDF 派生字段代码可以正常工作。但是使用 UDF 派生字段会出现以下错误
Task not serializable: java.io.NotSerializableException: scala.runtime.LazyRef
class SparkEntry extends Serializable {
def process(): Unit = {
def modifyword = (file_path:String) => {file_path+"_"}
val spark = SparkSession.builder().appName("spp").master("local").getOrCreate()
spark.udf.register("customudf",modifyword)
val someData = Seq(
Row(8, "bat"),
Row(9, "bat"),
Row(64, "mouse"),
Row(9, "mouse"),
Row(-27, "horse"),
Row(9, "horse")
)
val someSchema = List(
StructField("number", IntegerType, true),
StructField("word", StringType, true)
)
val someDF = spark.createDataFrame(
spark.sparkContext.parallelize(someData),
StructType(someSchema)
)
val new_df = someDF.withColumn("new_column",callUDF("customudf",cols = col("word")))
new_df.show()
val grouped_df = new_df.groupBy("word").agg(collect_list(struct(col("new_column"),col("number")))).toDF("word","combined")
grouped_df.show()
spark.close()
} }
对我来说效果很好-
def modifyword = (file_path:String) => {file_path+"_"}
val spark = SparkSession.builder().appName("spp").master("local").getOrCreate()
spark.udf.register("customudf",modifyword)
val someData = Seq(
Row(8, "bat"),
Row(9, "bat"),
Row(64, "mouse"),
Row(9, "mouse"),
Row(-27, "horse"),
Row(9, "horse")
)
val someSchema = List(
StructField("number", IntegerType, true),
StructField("word", StringType, true)
)
val someDF = spark.createDataFrame(
spark.sparkContext.parallelize(someData),
StructType(someSchema)
)
val new_df = someDF.withColumn("new_column",callUDF("customudf",cols = col("word")))
new_df.show()
val grouped_df = new_df.groupBy("word").agg(collect_list(struct(col("new_column"),col("number")))).toDF("word","combined")
grouped_df.show()
/**
* +------+-----+----------+
* |number| word|new_column|
* +------+-----+----------+
* | 8| bat| bat_|
* | 9| bat| bat_|
* | 64|mouse| mouse_|
* | 9|mouse| mouse_|
* | -27|horse| horse_|
* | 9|horse| horse_|
* +------+-----+----------+
*
* +-----+--------------------+
* | word| combined|
* +-----+--------------------+
* | bat|[[bat_, 8], [bat_...|
* |horse|[[horse_, -27], [...|
* |mouse|[[mouse_, 64], [m...|
* +-----+--------------------+
*/
尝试将 scala 版本升级到2.12.4
。 LazyRef在那里是可Serializable
的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.