[英]Calling UDF function and get Task not serializable Exception
代码工作正常,直到我进行了一些更改,因为我需要将rotatekey
实现为UDF function
但我错过了一些东西,因为我收到了这个错误
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
...
...
at playground.RotatingKeys.run(RotatingKeys.scala:25)
at playground.Main$.main(RotatingKeys.scala:37)
at playground.Main.main(RotatingKeys.scala)
Caused by: java.io.NotSerializableException: playground.RotatingKeys
Serialization stack:
- object not serializable (class: playground.RotatingKeys, value: playground.RotatingKeys@e07b4db)
代码如下
import org.apache.logging.log4j.{LogManager, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.expressions.UserDefinedFunction
class RotatingKeys(spark: SparkSession, nRotations: Integer) {
import spark.implicits._
val logger: Logger = LogManager.getLogger(getClass)
logger.info("Initializing KeyRotatorJob")
def rotateKeyUdf: UserDefinedFunction = {
udf{ key: String => rotatekey(key, nRotations) }
}
def rotatekey(key: String, nRotations: Integer): String =
key.substring(nRotations) + key.substring(0, nRotations)
def run(): Unit =
spark
.sql("SELECT '0123456' as key")
.withColumn("rotated_key", rotateKeyUdf($"key"))
.show()
}
object Main {
val spark = SparkSession.builder()
.appName("Run Trials")
.config("spark.master", "local")
.getOrCreate()
def main(args: Array[String]): Unit = {
val rkRun = new RotatingKeys(spark,4)
rkRun.run()
}
}
它工作正常
+-------+-----------+
| key|rotated_key|
+-------+-----------+
|0123456| 4560123|
+-------+-----------+
请一些帮助将不胜感激。
不要直接在 udf 闭包内使用类成员(变量/方法)。 (如果您想直接使用它,那么 class 必须是可序列化的)将其作为列单独发送 -
import org.apache.log4j.LogManager
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.UserDefinedFunction
// SO=63064483
class RotatingKeys(spark: SparkSession, nRotations: Integer) {
import spark.implicits._
val logger = LogManager.getLogger(getClass)
logger.info("Initializing KeyRotatorJob")
def rotateKeyUdf: UserDefinedFunction = {
udf{ (key: String, nRotations: Integer) => key.substring(nRotations) + key.substring(0, nRotations) }
}
def run(): Unit =
spark
.sql("SELECT '0123456' as key")
.withColumn("rotated_key", rotateKeyUdf($"key", lit(nRotations)))
.show()
}
object Main {
val spark = SparkSession.builder()
.appName("Run Trials")
.config("spark.master", "local")
.getOrCreate()
def main(args: Array[String]): Unit = {
val rkRun = new RotatingKeys(spark,4)
rkRun.run()
}
}
如果您想使用方法( rotatekey
),请将其作为实用程序并将其移至 object,如下所示 -
import org.apache.log4j.LogManager
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.UserDefinedFunction
// SO=63064483
class RotatingKeys(spark: SparkSession, nRotations: Integer) {
import spark.implicits._
val logger = LogManager.getLogger(getClass)
logger.info("Initializing KeyRotatorJob")
def run(): Unit =
spark
.sql("SELECT '0123456' as key")
.withColumn("rotated_key", Main.rotateKeyUdf($"key", lit(nRotations)))
.show()
}
object Main {
val spark = SparkSession.builder()
.appName("Run Trials")
.config("spark.master", "local")
.getOrCreate()
def main(args: Array[String]): Unit = {
val rkRun = new RotatingKeys(spark,4)
rkRun.run()
}
def rotateKeyUdf: UserDefinedFunction = {
udf{ (key: String, nRotations: Integer) => rotatekey(key, nRotations) }
}
def rotatekey(key: String, nRotations: Integer): String =
key.substring(nRotations) + key.substring(0, nRotations)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.