[英]Spark: Task not Serializable for UDF on DataFrame
I get org.apache.spark.SparkException: Task not serializable
when I try to execute the following on Spark 1.4.1:当我尝试在 Spark 1.4.1 上执行以下操作时,我得到
org.apache.spark.SparkException: Task not serializable
:
import java.sql.{Date, Timestamp}
import java.text.SimpleDateFormat
object ConversionUtils {
val iso8601 = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSX")
def tsUTC(s: String): Timestamp = new Timestamp(iso8601.parse(s).getTime)
val castTS = udf[Timestamp, String](tsUTC _)
}
val df = frame.withColumn("ts", ConversionUtils.castTS(frame("ts_str")))
df.first
Here, frame
is a DataFrame
that lives within a HiveContext
.在这里,
frame
是一个位于DataFrame
中的HiveContext
。 That data frame does not have any issues.该数据框没有任何问题。
I have similar UDFs for integers and they work without any problem.我有类似的整数 UDF,它们可以正常工作。 However, the one with timestamps seems to cause problems.
但是,带有时间戳的那个似乎会引起问题。 According to the documentation ,
java.sql.TimeStamp
implements Serializable
, so that's not the problem.根据文档,
java.sql.TimeStamp
实现了Serializable
,所以这不是问题。 The same is true for SimpleDateFormat
as can be seen here . SimpleDateFormat
也是如此,可以在这里看到。
This causes me to believe it's the UDF that's causing problems.这让我相信是 UDF 导致了问题。 However, I'm not sure what and how to fix it.
但是,我不确定是什么以及如何解决它。
The relevant section of the trace:跟踪的相关部分:
Caused by: java.io.NotSerializableException: ...
Serialization stack:
- object not serializable (class: ..., value: ...$ConversionUtils$@63ed11dd)
- field (class: ...$ConversionUtils$$anonfun$3, name: $outer, type: class ...$ConversionUtils$)
- object (class ...$ConversionUtils$$anonfun$3, <function1>)
- field (class: org.apache.spark.sql.catalyst.expressions.ScalaUdf$$anonfun$2, name: func$2, type: interface scala.Function1)
- object (class org.apache.spark.sql.catalyst.expressions.ScalaUdf$$anonfun$2, <function1>)
- field (class: org.apache.spark.sql.catalyst.expressions.ScalaUdf, name: f, type: interface scala.Function1)
- object (class org.apache.spark.sql.catalyst.expressions.ScalaUdf, scalaUDF(ts_str#2683))
- field (class: org.apache.spark.sql.catalyst.expressions.Alias, name: child, type: class org.apache.spark.sql.catalyst.expressions.Expression)
- object (class org.apache.spark.sql.catalyst.expressions.Alias, scalaUDF(ts_str#2683) AS ts#7146)
- element of array (index: 35)
- array (class [Ljava.lang.Object;, size 36)
- field (class: scala.collection.mutable.ArrayBuffer, name: array, type: class [Ljava.lang.Object;)
- object (class scala.collection.mutable.ArrayBuffer,
Try:尝试:
object ConversionUtils extends Serializable {
...
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.