简体   繁体   English

错误执行程序:阶段 6.0 中的任务 0.0 异常火花 scala?

[英]ERROR Executor: Exception in task 0.0 in stage 6.0 spark scala?

I have a json file like below.我有一个 json 文件,如下所示。

{"name":"method2","name1":"test","parameter1":"C:/Users/test/Desktop/Online.csv","parameter2": 1.0}

I am loading my json file.我正在加载我的 json 文件。

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.json("C:/Users/test/Desktop/data.json")
val df1=df.select($"name",$"parameter1",$"parameter2").toDF()
df1.show()

I have 3 function like below:我有 3 个 function 如下所示:

def method1(P1:String, P2:Double) {
val data = spark.read.option("header", true).csv(P1).toDF()
val rs= data.select("CID", "Sc").dropDuplicates("CID", "Sc").withColumn("Rat", lit(P2))
val outPutPath="C:/Users/test/Desktop/output"
rs.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").save(outPutPath)
}
def method2(P1:String, P2:Double){
val data = spark.read.option("header", true).csv(P1).toDF()
val rs= data.select("CID", "Sc").withColumn("r", lit(P2))
val rs1= rs.filter($"CID" =!= "").groupBy("CID","Sc").agg(sum(rs("r")).alias("R"))
val outPutPath="C:/Users/test/Desktop/output"
rs1.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").save(outPutPath)
}
def methodn(P1:String, P2:Double) {
println("method 2 printhing")
println(P2)
}

i am trying to call the above functions using below code我正在尝试使用以下代码调用上述函数

df1.map( row => (row.getString(0), row.getString(1), row.getDouble(2) ) ).foreach { x =>
      x._1.trim.toLowerCase match {
          case "method1" => method1(x._2, x._3) 
          case "method2" => method2(x._2, x._3)
          case _ => methodn(x._2, x._3)
      }
   } 

based on my json object it should call method2 but when i am trying to execute above code i am getting below error.基于我的 json object 它应该调用 method2 但是当我尝试执行上述代码时,我遇到了错误。

17/11/22 16:15:44 ERROR Executor: Exception in task 0.0 in stage 6.0 (TID 6)
java.lang.NullPointerException
        at $line36.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.method2(<console>:24)
        at $line38.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<console>:40)
        at $line38.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<console>:37)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
17/11/22 16:15:44 WARN TaskSetManager: Lost task 0.0 in stage 6.0 (TID 6, localhost, executor driver): java.lang.NullPointerException
        at $line36.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.method2(<console>:24)
        at $line38.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<console>:40)
        at $line38.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<console>:37)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

17/11/22 16:15:44 ERROR TaskSetManager: Task 0 in stage 6.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 6, localhost, executor driver): java.lang.NullPointerException
        at method2(<console>:24)
        at $anonfun$2.apply(<console>:40)
        at $anonfun$2.apply(<console>:37)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
  at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:918)
  at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:916)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  at org.apache.spark.rdd.RDD.foreach(RDD.scala:916)
  at org.apache.spark.sql.Dataset$$anonfun$foreach$1.apply$mcV$sp(Dataset.scala:2325)
  at org.apache.spark.sql.Dataset$$anonfun$foreach$1.apply(Dataset.scala:2325)
  at org.apache.spark.sql.Dataset$$anonfun$foreach$1.apply(Dataset.scala:2325)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
  at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2823)
  at org.apache.spark.sql.Dataset.foreach(Dataset.scala:2324)
  ... 54 elided
Caused by: java.lang.NullPointerException
  at method2(<console>:24)
  at $anonfun$2.apply(<console>:40)
  at $anonfun$2.apply(<console>:37)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
  at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918)
  at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
  at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
  at org.apache.spark.scheduler.Task.run(Task.scala:108)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)

please help me on this how to resolve this issue.请帮助我解决这个问题。

You are getting NullPointerException because you are trying to access sparkSession ( spark ) inside the functions ( method1, method2 ).您收到NullPointerException是因为您尝试访问函数method1, method2 )内部的sparkSessionspark )。 Thats not an actual issue though.这不是一个实际的问题。 The main issue is that you are calling those functions from inside map function of dataframe .主要问题是您从map的 map dataframe内部调用这些函数 Thats the main issue.这是主要问题。

You cannot access variables defined outside transformations from within transformations .您不能从transformations内部访问在transformations外部定义的变量 All the functions are being called inside transformations and Spark could not find any definition for spark variable being used inside those functions.所有函数都在transformations内部调用, Spark找不到在这些函数内部使用的spark变量的任何定义。 Thats the main reason for getting nullPointerException .这就是获得nullPointerException的主要原因。

The solution to this would be to call the functions from where spark variable can be accessed and not from within a transformation .对此的解决方案是调用可以访问spark变量的函数,而不是从transformation内部调用。 So changing your last transformation into an action would do the trick因此,将您的最后一次transformation更改为一个action就可以了

val process = df1.map( row => (row.getString(0), row.getString(1), row.getDouble(2) ) ).collect

process.foreach { x =>
  x._1.trim.toLowerCase match {
    case "method1" => method1(x._2, x._3)
    case "method2" => method2(x._2, x._3)
    case _ => methodn(x._2, x._3)
  }
}

I hope the answer is helpful我希望答案有帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark 错误执行程序:阶段 0.0 (tid 0) 中的任务 0.0 中的异常 java.lang.ArithmeticException - Spark ERROR executor: Exception in task 0.0 in stage 0.0 (tid 0) java.lang.ArithmeticException 什么情况会抛出异常“WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost): scala.util.control.BreakControl” - What situation will throw exception "WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost): scala.util.control.BreakControl" 错误执行程序:阶段1.0(TID 1)中的任务1.0中的异常java.net.NoRouteToHostException:主机没有路由 - ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 1) java.net.NoRouteToHostException: No route to host 在阶段0.0中1.0丢失任务 - Lost task in 1.0 in stage 0.0 Spark Scala:任务不可序列化错误 - Spark Scala: Task Not serializable error Spark Scala中的任务不可序列化错误 - Task not serializable error in Spark Scala Spark-阶段中丢失任务 - Spark - Lost task in stage Scala 错误:线程“main”org.apache.spark.SparkException 中的异常:任务不可序列化 - Scala error: Exception in thread "main" org.apache.spark.SparkException: Task not serializable Scala Spark 任务在代码中不可序列化错误 - Scala Spark Task not serializable error in code 火花中的舞台/任务结束是否有“钩子”? - Is there a “hook” for the end of a stage/task in spark?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM