简体   繁体   English

scala spark rdd 错误:java.lang.ClassCastException:无法分配 java.lang.invoke.Serialized 的实例

[英]scala spark rdd error : java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda

I am a beginner to Scala and Spark.我是 Scala 和 Spark 的初学者。

scala version: 2.12.10 scala版本:2.12.10

spark version: 3.0.1火花版本:3.0.1

I'm trying a very simple spark rdd function in scala.我正在 scala 中尝试一个非常简单的 spark rdd function。

But I get an error.但我得到一个错误。

(1) build.sbt (1) 构建.sbt


scalaVersion := "2.12.10"


name := "hello-world"
organization := "ch.epfl.scala"
version := "1.0"


libraryDependencies += "org.scala-lang.modules" %% "scala-parser-combinators" % "1.1.2"
libraryDependencies +=        "org.apache.spark" %% "spark-sql" % "3.0.1" 
libraryDependencies +=        "org.apache.spark" %% "spark-core" % "3.0.1" 

(2) Main.scala (2) Main.scala

import org.apache.spark.sql.SparkSession
object Main extends App {
  println("Hello, World!")

  implicit val spark = SparkSession.builder()
        .master("spark://centos-master:7077")
        // .master("local[*]")
        .appName("spark-api")
        .getOrCreate()


  val inputrdd = spark.sparkContext.parallelize(Seq(("arth",10), ("arth", 20), ("samuel", 60), ("jack", 65)))
      println("inputrdd : ", inputrdd)
  val mapped = inputrdd.mapValues(x => (x, 1))
  println("mapped : ", mapped)
  mapped.collect.foreach(println)
}

(3) When the error occurred (3) 发生错误时

It seems that an error occurs in the mapped.collect.foreach(println) part.似乎在 mapped.collect.foreach(println) 部分发生了错误。

(4) Error content (4) 错误内容

21/04/17 20:54:19 INFO DAGScheduler: Job 0 failed: collect at Main.scala:16, took 6.083947 s
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, 192.168.0.220, executor 0):

java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda 
to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in 
instance of org.apache.spark.rdd.MapPartitionsRDD

[error]         at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)
[error]         at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)
[error]         at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2410)
[error]         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error]         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error]         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error]         at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2404)
[error]         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error]         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error]         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error]         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
[error]         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
[error]         at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
[error]         at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
[error]         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
[error]         at org.apache.spark.scheduler.Task.run(Task.scala:127)
[error]         at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[error]         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[error]         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[error]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error]         at java.lang.Thread.run(Thread.java:748)
[error] 
[error] Driver stacktrace:
21/04/17 20:54:19 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6) on 192.168.0.220, executor 0: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 7]
21/04/17 20:54:19 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
[error] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, 192.168.0.220, executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
[error]         at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)
[error]         at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)
[error]         at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2410)
[error]         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error]         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error]         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error]         at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2404)
[error]         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error]         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error]         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error]         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
[error]         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
[error]         at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
[error]         at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
[error]         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
[error]         at org.apache.spark.scheduler.Task.run(Task.scala:127)
[error]         at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[error]         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[error]         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[error]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error]         at java.lang.Thread.run(Thread.java:748)
[error] 
[error] Driver stacktrace:
[error]         at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
[error]         at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
[error]         at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
[error]         at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
[error]         at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
[error]         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
[error]         at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
[error]         at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
[error]         at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
[error]         at scala.Option.foreach(Option.scala:407)
[error]         at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
[error]         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
[error]         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
[error]         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
[error]         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
[error]         at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775)
[error]         at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
[error]         at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120)
[error]         at org.apache.spark.SparkContext.runJob(SparkContext.scala:2139)
[error]         at org.apache.spark.SparkContext.runJob(SparkContext.scala:2164)
[error]         at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
[error]         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[error]         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[error]         at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
[error]         at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
[error]         at Main$.delayedEndpoint$Main$1(Main.scala:16)
[error]         at Main$delayedInit$body.apply(Main.scala:2)
[error]         at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error]         at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error]         at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error]         at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error]         at scala.collection.immutable.List.foreach(List.scala:392)
[error]         at scala.App.main(App.scala:80)
[error]         at scala.App.main$(App.scala:78)
[error]         at Main$.main(Main.scala:2)
[error]         at Main.main(Main.scala)
[error]         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error]         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]         at java.lang.reflect.Method.invoke(Method.java:498)
[error] Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
[error]         at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)
[error]         at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)
[error]         at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2410)
[error]         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error]         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error]         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error]         at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2404)
[error]         at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error]         at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error]         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error]         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
[error]         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
[error]         at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
[error]         at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
[error]         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
[error]         at org.apache.spark.scheduler.Task.run(Task.scala:127)
[error]         at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[error]         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[error]         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[error]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error]         at java.lang.Thread.run(Thread.java:748)

If I need more libraries or if the code is wrong (but it works fine in spark-shell)....如果我需要更多库或者代码错误(但在 spark-shell 中可以正常工作)......

How to solve it?如何解决?

You need to submit your jars to spark so that your code can run on there.您需要将 jars 提交给 spark 以便您的代码可以在那里运行。 spark-shell is hiding all of this from you behind the scenes. spark-shell 在幕后对你隐藏了所有这些。

This answer provides better detail https://stackoverflow.com/a/28367602/1810962 with the background.这个答案提供了更好的背景细节https://stackoverflow.com/a/28367602/1810962

You can use bin/spark-submit as workaround and provide your local classpath using --class , --jars , and --driver-class-path您可以使用 bin/spark-submit 作为解决方法,并使用--class--jars--driver-class-path提供本地类路径

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Scala:java.lang.ClassCastException:无法将 java.lang.invoke.SerializedLambda 的实例分配给 scala.Function1 类型的字段 Child1.myfun - Scala: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field Child1.myfun of type scala.Function1 Spark - java.lang.ClassCastException:无法分配 scala.collection.immutable.List$SerializationProxy 的实例 - Spark - java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy Spark scala: java.lang.ClassCastException: java.lang.Integer cannot be cast to scala.collection.Seq - Spark scala: java.lang.ClassCastException: java.lang.Integer cannot be cast to scala.collection.Seq “主要” java.lang.ClassCastException:[Lscala.Tuple2; 无法在Spark MLlib LDA中强制转换为scala.Tuple2 - “main” java.lang.ClassCastException: [Lscala.Tuple2; cannot be cast to scala.Tuple2 in Spark MLlib LDA java.lang.ClassCastException: org.apache.spark.sql.Column cannot be cast to scala.collection.Seq - java.lang.ClassCastException: org.apache.spark.sql.Column cannot be cast to scala.collection.Seq 从 Scala Spark 写入 Kafka 主题失败并出现错误:java.lang.ClassCastException - Writing to Kafka topic from Scala Spark fails with error: java.lang.ClassCastException randomSplit 期间的 scala java.lang.ClassCastException - scala java.lang.ClassCastException during randomSplit Scala:向下转换抛出 java.lang.ClassCastException - Scala: downcasting throws java.lang.ClassCastException 注入构造函数时出错,java.lang.ClassCastException:Play 2.5.4 - Scala - Error injecting constructor, java.lang.ClassCastException: Play 2.5.4 - Scala 创建RDD [LabeledPoint]:java.lang.ClassCastException:java.lang.Long无法强制转换为java.lang.Double - Creation of RDD[LabeledPoint]: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM