![](/img/trans.png)
[英]Exception in thread “main” org.apache.spark.SparkException: Task not serializable"
[英]Scala error: Exception in thread "main" org.apache.spark.SparkException: Task not serializable
運行此代碼時出現不可序列化錯誤:
import org.apache.spark.{SparkContext, SparkConf}
import scala.collection.mutable.ArrayBuffer
object Task1 {
def findHighestRatingUsers(movieRating: String): (String) = {
val tokens = movieRating.split(",", -1)
val movieTitle = tokens(0)
val ratings = tokens.slice(1, tokens.size)
val maxRating = ratings.max
var userIds = ArrayBuffer[Int]()
for(i <- 0 until ratings.length){
if (ratings(i) == maxRating) {
userIds += (i+1)
}
}
return movieTitle + "," + userIds.mkString(",")
return movieTitle
}
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Task 1")
val sc = new SparkContext(conf)
val Lines = sc.textFile(args(0))
val TitleAndMaxUserIds = Lines.map(findHighestRatingUsers)
.saveAsTextFile(args(1))
}
}
錯誤發生在行:
val TitleAndMaxUserIds =Lines.map(findHighestRatingUsers)
.saveAsTextFile(args(1))
我相信這是由於函數“findHighestRatingUsers”中的某些東西。 有人可以解釋為什么以及如何解決它嗎?
異常中的更多信息如下:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2362)
at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:396)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
at org.apache.spark.rdd.RDD.map(RDD.scala:395)
at Task1$.main(Task1.scala:63)
at Task1.main(Task1.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: Task1$
Serialization stack:
- object not serializable (class: Task1$, value: Task1$@3c770db4)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 1)
- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class Task1$, functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic Task1$.$anonfun$main$1:(LTask1$;Ljava/lang/String;)Ljava/lang/String;, instantiatedMethodType=(Ljava/lang/String;)Ljava/lang/String;, numCaptured=1])
- writeReplace data (class: java.lang.invoke.SerializedLambda)
- object (class Task1$$$Lambda$1023/20408451, Task1$$$Lambda$1023/20408451@4f59a516)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:413)
... 22 more
我檢查了這篇文章Scala中對象和類之間的區別,並嘗試使用對象來封裝函數:
import org.apache.spark.{SparkContext, SparkConf}
import scala.collection.mutable.ArrayBuffer
object Function{
def _findHighestRatingUsers(movieRating: String): (String) = {
val tokens = movieRating.split(",", -1)
val movieTitle = tokens(0)
val ratings = tokens.slice(1, tokens.size)
val maxRating = ratings.max
var userIds = ArrayBuffer[Int]()
for(i <- 0 until ratings.length){
if (ratings(i) == maxRating) {
userIds += (i+1)
}
}
return movieTitle + "," + userIds.mkString(",")
}
}
object Task1 {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Task 1")
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(0))
val output = textFile.map(Function._findHighestRatingUsers)
.saveAsTextFile(args(1))
}
}
但仍然有大量錯誤的異常......
這次我嘗試將對象 Function 放在對象 task1 中,如下所示:
import org.apache.spark.{SparkContext, SparkConf}
import scala.collection.mutable.ArrayBuffer
object Task1 {
object Function{
def _findHighestRatingUsers(movieRating: String): (String) = {
val tokens = movieRating.split(",", -1)
val movieTitle = tokens(0)
val ratings = tokens.slice(1, tokens.size)
val maxRating = ratings.max
var userIds = ArrayBuffer[Int]()
for(i <- 0 until ratings.length){
if (ratings(i) == maxRating) {
userIds += (i+1)
}
}
return movieTitle + "," + userIds.mkString(",")
}
}
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Task 1")
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(0))
val output = textFile.map(Function._findHighestRatingUsers)
.saveAsTextFile(args(1))
}
}
問題解決了。 但是我仍然不知道為什么嵌套對象解決了這個問題。 有人可以解釋一下嗎? 此外,我有幾點不確定:
首先,我建議您通過閱讀有關 Scala 和 Spark 的文檔來熟悉它,因為您的問題表明您剛剛開始使用它。
我會給你一些關於“任務不可序列化”的原始問題的見解(但沒有准確地回答它),並讓你為你稍后在帖子中添加的問題打開其他問題,否則這個答案將是一團糟。
您可能知道,Spark 允許分布式計算。 為此,Spark 所做的一件事就是獲取您編寫的代碼,將其序列化並將其發送給某個地方的某些執行程序以實際運行它。 這里的關鍵部分是您的代碼必須是可序列化的。
您得到的錯誤是告訴您 Spark 無法序列化您的代碼。
現在,如何使它可序列化? 這就是它可能變得具有挑戰性的地方,即使 Spark 試圖通過提供“序列化堆棧”來幫助您,但有時它提供的信息並沒有那么有用。
在您的情況下(代碼的第一個示例), findHighestRatingUsers
必須序列化,但要這樣做,它必須序列化不可序列化的整個object Task1
。
為什么Task1
不可序列化? 我承認我不太確定,但我會打賭main
方法,盡管我希望你的第二個例子是可序列化的。
您可以在網絡上的各種文檔或博客文章中閱讀有關此內容的更多信息。 例如:https ://medium.com/swlh/spark-serialization-errors-e0eebcf0f6e6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.