繁体   English   中英

调用 UDF function 并获取 Task not serializable 异常

[英]Calling UDF function and get Task not serializable Exception

代码工作正常,直到我进行了一些更改,因为我需要将rotatekey实现为UDF function但我错过了一些东西,因为我收到了这个错误

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
  ...
  ...
at playground.RotatingKeys.run(RotatingKeys.scala:25)
at playground.Main$.main(RotatingKeys.scala:37)
at playground.Main.main(RotatingKeys.scala)
Caused by: java.io.NotSerializableException: playground.RotatingKeys
Serialization stack:
- object not serializable (class: playground.RotatingKeys, value: playground.RotatingKeys@e07b4db)

代码如下

import org.apache.logging.log4j.{LogManager, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.expressions.UserDefinedFunction

class RotatingKeys(spark: SparkSession, nRotations: Integer) {
  import spark.implicits._

  val logger: Logger = LogManager.getLogger(getClass)

  logger.info("Initializing KeyRotatorJob")

  def rotateKeyUdf: UserDefinedFunction = {
    udf{ key: String => rotatekey(key, nRotations) }
  }

  def rotatekey(key: String, nRotations: Integer): String =
    key.substring(nRotations) + key.substring(0, nRotations)

  def run(): Unit =
    spark
      .sql("SELECT '0123456' as key")
      .withColumn("rotated_key", rotateKeyUdf($"key"))
      .show()
}

object Main {
  val spark = SparkSession.builder()
    .appName("Run Trials")
    .config("spark.master", "local")
    .getOrCreate()

  def main(args: Array[String]): Unit = {
    val rkRun = new RotatingKeys(spark,4)
    rkRun.run()
  }
}

它工作正常

+-------+-----------+
|    key|rotated_key|
+-------+-----------+
|0123456|    4560123|
+-------+-----------+

请一些帮助将不胜感激。

不要直接在 udf 闭包内使用类成员(变量/方法)。 (如果您想直接使用它,那么 class 必须是可序列化的)将其作为列单独发送 -

import org.apache.log4j.LogManager
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.UserDefinedFunction

// SO=63064483
class RotatingKeys(spark: SparkSession, nRotations: Integer) {
  import spark.implicits._

  val logger = LogManager.getLogger(getClass)

  logger.info("Initializing KeyRotatorJob")

  def rotateKeyUdf: UserDefinedFunction = {
    udf{ (key: String, nRotations: Integer) => key.substring(nRotations) + key.substring(0, nRotations) }
  }

  def run(): Unit =
    spark
      .sql("SELECT '0123456' as key")
      .withColumn("rotated_key", rotateKeyUdf($"key", lit(nRotations)))
      .show()
}

object Main {
  val spark = SparkSession.builder()
    .appName("Run Trials")
    .config("spark.master", "local")
    .getOrCreate()

  def main(args: Array[String]): Unit = {
    val rkRun = new RotatingKeys(spark,4)
    rkRun.run()
  }
}

如果您想使用方法( rotatekey ),请将其作为实用程序并将其移至 object,如下所示 -

import org.apache.log4j.LogManager
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.UserDefinedFunction

// SO=63064483
class RotatingKeys(spark: SparkSession, nRotations: Integer) {
  import spark.implicits._

  val logger = LogManager.getLogger(getClass)

  logger.info("Initializing KeyRotatorJob")

  def run(): Unit =
    spark
      .sql("SELECT '0123456' as key")
      .withColumn("rotated_key", Main.rotateKeyUdf($"key", lit(nRotations)))
      .show()
}

object Main {
  val spark = SparkSession.builder()
    .appName("Run Trials")
    .config("spark.master", "local")
    .getOrCreate()

  def main(args: Array[String]): Unit = {
    val rkRun = new RotatingKeys(spark,4)
    rkRun.run()
  }

  def rotateKeyUdf: UserDefinedFunction = {
    udf{ (key: String, nRotations: Integer) => rotatekey(key, nRotations) }
  }

  def rotatekey(key: String, nRotations: Integer): String =
    key.substring(nRotations) + key.substring(0, nRotations)
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM