繁体   English   中英

Scala UDF with collect_list 任务不可序列化

[英]Scala UDF with collect_list task not serializable

我正在尝试在 udf 字段上使用 collect_list 。 下面是我的代码。 如果我不使用 UDF 派生字段代码可以正常工作。但是使用 UDF 派生字段会出现以下错误

Task not serializable: java.io.NotSerializableException: scala.runtime.LazyRef


class SparkEntry extends Serializable {
  def process(): Unit = {
    def modifyword = (file_path:String) => {file_path+"_"}
    val spark = SparkSession.builder().appName("spp").master("local").getOrCreate()
    spark.udf.register("customudf",modifyword)
    val someData = Seq(
      Row(8, "bat"),
      Row(9, "bat"),
      Row(64, "mouse"),
      Row(9, "mouse"),
      Row(-27, "horse"),
      Row(9, "horse")
   )
  val someSchema = List(
     StructField("number", IntegerType, true),
     StructField("word", StringType, true)
  )

  val someDF = spark.createDataFrame(
    spark.sparkContext.parallelize(someData),
    StructType(someSchema)
  )
  val new_df = someDF.withColumn("new_column",callUDF("customudf",cols = col("word")))
  new_df.show()
  val grouped_df = new_df.groupBy("word").agg(collect_list(struct(col("new_column"),col("number")))).toDF("word","combined")
 grouped_df.show()
 spark.close()

} }

对我来说效果很好-

   def modifyword = (file_path:String) => {file_path+"_"}
    val spark = SparkSession.builder().appName("spp").master("local").getOrCreate()
    spark.udf.register("customudf",modifyword)
    val someData = Seq(
      Row(8, "bat"),
      Row(9, "bat"),
      Row(64, "mouse"),
      Row(9, "mouse"),
      Row(-27, "horse"),
      Row(9, "horse")
    )
    val someSchema = List(
      StructField("number", IntegerType, true),
      StructField("word", StringType, true)
    )

    val someDF = spark.createDataFrame(
      spark.sparkContext.parallelize(someData),
      StructType(someSchema)
    )
    val new_df = someDF.withColumn("new_column",callUDF("customudf",cols = col("word")))
    new_df.show()
    val grouped_df = new_df.groupBy("word").agg(collect_list(struct(col("new_column"),col("number")))).toDF("word","combined")
    grouped_df.show()

    /**
      * +------+-----+----------+
      * |number| word|new_column|
      * +------+-----+----------+
      * |     8|  bat|      bat_|
      * |     9|  bat|      bat_|
      * |    64|mouse|    mouse_|
      * |     9|mouse|    mouse_|
      * |   -27|horse|    horse_|
      * |     9|horse|    horse_|
      * +------+-----+----------+
      *
      * +-----+--------------------+
      * | word|            combined|
      * +-----+--------------------+
      * |  bat|[[bat_, 8], [bat_...|
      * |horse|[[horse_, -27], [...|
      * |mouse|[[mouse_, 64], [m...|
      * +-----+--------------------+
      */

尝试将 scala 版本升级到2.12.4 LazyRef在那里是可Serializable

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM