简体   繁体   English

使用mapPartitions在Apache Spark中对RDD排序并减少

[英]Sorting an RDD in Apache Spark using mapPartitions and reduce

I am trying to sort an RDD in Spark. 我正在尝试在Spark中对RDD进行排序。 I am aware that I can use the sortBy transformation to obtain a sorted RDD. 我知道我可以使用sortBy转换来获得排序的RDD。 I am trying to measure how sortBy performs when compared to using mapPartitions to sort individual partitions, and then using a reduce function to merge the partitions to obtain a sorted list. 与使用mapPartitions对单个分区排序,然后使用reduce函数合并分区以获得排序列表相比,我正在尝试评估sortBy性能。 When I use this approach I run into a java.lang.InvocationTargetException 当我使用这种方法时,我遇到了java.lang.InvocationTargetException

Here is my implementation: 这是我的实现:

import java.util.*;
import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;


class Inc{
  String line;
  Double income;
}

public class SimpleApp {
  public static void main(String[] args) {
    String logFile = "YOUR_SPARK_HOME/README.md"; // Should be some file on your system
    SparkConf conf = new SparkConf().setAppName("Simple Application");
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaRDD<String> rdd = sc.textFile("data.txt",4);
    long start = System.currentTimeMillis();


    JavaRDD<LinkedList<Inc>> rdd3 = rdd.mapPartitions(new FlatMapFunction<Iterator<String>, LinkedList<Inc>>(){

    @Override
    public Iterable<LinkedList<Inc>> call(Iterator<String> t)
        throws Exception {
      LinkedList<Inc> lines = new LinkedList<Inc>();
      while(t.hasNext()){
        Inc i = new Inc();
        String s = t.next();
        i.line = s;
        String arr1[] = s.split(",");
        i.income = Double.parseDouble(arr1[24]);
        lines.add(i);
      }
      Collections.sort(lines, new IncomeComparator());
      LinkedList<LinkedList<Inc>> list = new LinkedList<LinkedList<Inc>>();
      list.add(lines);
      return list;
    }

    });

    rdd3.reduce(new Function2<LinkedList<Inc>, LinkedList<Inc>, LinkedList<Inc>>(){

    @Override
    public LinkedList<Inc> call(LinkedList<Inc> a,
        LinkedList<Inc> b) throws Exception {
      LinkedList<Inc> result = new LinkedList<Inc>();
      while (a.size() > 0 && b.size() > 0) {

        if (a.getFirst().income.compareTo(b.getFirst().income) <= 0)
          result.add(a.poll());
        else
          result.add(b.poll());
      }

      while (a.size() > 0)
        result.add(a.poll());

      while (b.size() > 0)
        result.add(b.poll());

      return result;

    }

    });

    long end = System.currentTimeMillis();
    System.out.println(end - start);

  }

  public static class IncomeComparator implements Comparator<Inc> {

    @Override
    public int compare(Inc a, Inc b) {
      return a.income.compareTo(b.income);
    }
  }
}

The error I get is 我得到的错误是

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/06/01 15:01:50 INFO SparkContext: Running Spark version 1.3.0
15/06/01 15:01:50 INFO SecurityManager: Changing view acls to: rshankar
15/06/01 15:01:50 INFO SecurityManager: Changing modify acls to: rshankar
15/06/01 15:01:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rshankar); users with modify permissions: Set(rshankar)
15/06/01 15:01:50 INFO Slf4jLogger: Slf4jLogger started
15/06/01 15:01:51 INFO Remoting: Starting remoting
15/06/01 15:01:51 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@sslab01.cs.purdue.edu:40654]
15/06/01 15:01:51 INFO Utils: Successfully started service 'sparkDriver' on port 40654.
15/06/01 15:01:51 INFO SparkEnv: Registering MapOutputTracker
15/06/01 15:01:51 INFO SparkEnv: Registering BlockManagerMaster
15/06/01 15:01:51 INFO DiskBlockManager: Created local directory at /tmp/spark-4cafc660-84b5-4e51-9553-7ded22f179a9/blockmgr-fa2d7355-ba5a-4eec-9a4c-bebbf6b41b95
15/06/01 15:01:51 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/06/01 15:01:51 INFO HttpFileServer: HTTP File server directory is /tmp/spark-6f487208-ecdb-4e82-ab1e-b55fc2d910b9/httpd-584c4b2b-47d7-4d6d-80e9-965b7721c8ae
15/06/01 15:01:51 INFO HttpServer: Starting HTTP Server
15/06/01 15:01:51 INFO Server: jetty-8.y.z-SNAPSHOT
15/06/01 15:01:51 INFO AbstractConnector: Started SocketConnector@0.0.0.0:37466
15/06/01 15:01:51 INFO Utils: Successfully started service 'HTTP file server' on port 37466.
15/06/01 15:01:51 INFO SparkEnv: Registering OutputCommitCoordinator
15/06/01 15:01:51 INFO Server: jetty-8.y.z-SNAPSHOT
15/06/01 15:01:51 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/06/01 15:01:51 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/06/01 15:01:51 INFO SparkUI: Started SparkUI at http://sslab01.cs.purdue.edu:4040
15/06/01 15:01:51 INFO SparkContext: Added JAR file:/homes/rshankar/spark-java/target/simple-project-1.0.jar at http://128.10.25.101:37466/jars/simple-project-1.0.jar with timestamp 1433170911502
15/06/01 15:01:51 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@sslab01.cs.purdue.edu:7077/user/Master...
15/06/01 15:01:51 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150601150151-0003
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor added: app-20150601150151-0003/0 on worker-20150601150057-sslab05.cs.purdue.edu-48984 (sslab05.cs.purdue.edu:48984) with 4 cores
15/06/01 15:01:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150601150151-0003/0 on hostPort sslab05.cs.purdue.edu:48984 with 4 cores, 512.0 MB RAM
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor added: app-20150601150151-0003/1 on worker-20150601150013-sslab02.cs.purdue.edu-42836 (sslab02.cs.purdue.edu:42836) with 4 cores
15/06/01 15:01:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150601150151-0003/1 on hostPort sslab02.cs.purdue.edu:42836 with 4 cores, 512.0 MB RAM
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor added: app-20150601150151-0003/2 on worker-20150601150046-sslab04.cs.purdue.edu-57866 (sslab04.cs.purdue.edu:57866) with 4 cores
15/06/01 15:01:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150601150151-0003/2 on hostPort sslab04.cs.purdue.edu:57866 with 4 cores, 512.0 MB RAM
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor added: app-20150601150151-0003/3 on worker-20150601150032-sslab03.cs.purdue.edu-43239 (sslab03.cs.purdue.edu:43239) with 4 cores
15/06/01 15:01:51 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150601150151-0003/3 on hostPort sslab03.cs.purdue.edu:43239 with 4 cores, 512.0 MB RAM
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor updated: app-20150601150151-0003/0 is now RUNNING
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor updated: app-20150601150151-0003/1 is now RUNNING
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor updated: app-20150601150151-0003/2 is now RUNNING
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor updated: app-20150601150151-0003/3 is now RUNNING
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor updated: app-20150601150151-0003/0 is now LOADING
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor updated: app-20150601150151-0003/2 is now LOADING
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor updated: app-20150601150151-0003/3 is now LOADING
15/06/01 15:01:51 INFO AppClient$ClientActor: Executor updated: app-20150601150151-0003/1 is now LOADING
15/06/01 15:01:51 INFO NettyBlockTransferService: Server created on 35703
15/06/01 15:01:51 INFO BlockManagerMaster: Trying to register BlockManager
15/06/01 15:01:51 INFO BlockManagerMasterActor: Registering block manager sslab01.cs.purdue.edu:35703 with 265.1 MB RAM, BlockManagerId(<driver>, sslab01.cs.purdue.edu, 35703)
15/06/01 15:01:51 INFO BlockManagerMaster: Registered BlockManager
15/06/01 15:01:52 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/06/01 15:01:52 INFO MemoryStore: ensureFreeSpace(32728) called with curMem=0, maxMem=278019440
15/06/01 15:01:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.0 KB, free 265.1 MB)
15/06/01 15:01:52 INFO MemoryStore: ensureFreeSpace(4959) called with curMem=32728, maxMem=278019440
15/06/01 15:01:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.8 KB, free 265.1 MB)
15/06/01 15:01:52 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on sslab01.cs.purdue.edu:35703 (size: 4.8 KB, free: 265.1 MB)
15/06/01 15:01:52 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/06/01 15:01:52 INFO SparkContext: Created broadcast 0 from textFile at SimpleApp.java:24
15/06/01 15:01:52 WARN LoadSnappy: Snappy native library is available
15/06/01 15:01:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/01 15:01:52 WARN LoadSnappy: Snappy native library not loaded
15/06/01 15:01:52 INFO FileInputFormat: Total input paths to process : 1
15/06/01 15:01:52 INFO SparkContext: Starting job: reduce at SimpleApp.java:51
15/06/01 15:01:52 INFO DAGScheduler: Got job 0 (reduce at SimpleApp.java:51) with 4 output partitions (allowLocal=false)
15/06/01 15:01:52 INFO DAGScheduler: Final stage: Stage 0(reduce at SimpleApp.java:51)
15/06/01 15:01:52 INFO DAGScheduler: Parents of final stage: List()
15/06/01 15:01:52 INFO DAGScheduler: Missing parents: List()
15/06/01 15:01:52 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[2] at mapPartitions at SimpleApp.java:28), which has no missing parents
15/06/01 15:01:52 INFO MemoryStore: ensureFreeSpace(3432) called with curMem=37687, maxMem=278019440
15/06/01 15:01:52 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.4 KB, free 265.1 MB)
15/06/01 15:01:52 INFO MemoryStore: ensureFreeSpace(2530) called with curMem=41119, maxMem=278019440
15/06/01 15:01:52 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 265.1 MB)
15/06/01 15:01:52 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on sslab01.cs.purdue.edu:35703 (size: 2.5 KB, free: 265.1 MB)
15/06/01 15:01:52 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/06/01 15:01:52 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839
15/06/01 15:01:52 INFO DAGScheduler: Submitting 4 missing tasks from Stage 0 (MapPartitionsRDD[2] at mapPartitions at SimpleApp.java:28)
15/06/01 15:01:52 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
15/06/01 15:01:54 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@sslab04.cs.purdue.edu:55037/user/Executor#212129285] with ID 2
15/06/01 15:01:54 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, sslab04.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:54 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, sslab04.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:54 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, sslab04.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:54 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, sslab04.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:54 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@sslab05.cs.purdue.edu:36783/user/Executor#-1944847176] with ID 0
15/06/01 15:01:54 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@sslab02.cs.purdue.edu:37539/user/Executor#-1786204780] with ID 1
15/06/01 15:01:54 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@sslab03.cs.purdue.edu:48810/user/Executor#614047045] with ID 3
15/06/01 15:01:54 INFO BlockManagerMasterActor: Registering block manager sslab03.cs.purdue.edu:43948 with 265.1 MB RAM, BlockManagerId(3, sslab03.cs.purdue.edu, 43948)
15/06/01 15:01:54 INFO BlockManagerMasterActor: Registering block manager sslab05.cs.purdue.edu:57248 with 265.1 MB RAM, BlockManagerId(0, sslab05.cs.purdue.edu, 57248)
15/06/01 15:01:54 INFO BlockManagerMasterActor: Registering block manager sslab04.cs.purdue.edu:43152 with 265.1 MB RAM, BlockManagerId(2, sslab04.cs.purdue.edu, 43152)
15/06/01 15:01:54 INFO BlockManagerMasterActor: Registering block manager sslab02.cs.purdue.edu:55873 with 265.1 MB RAM, BlockManagerId(1, sslab02.cs.purdue.edu, 55873)
15/06/01 15:01:54 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on sslab04.cs.purdue.edu:43152 (size: 2.5 KB, free: 265.1 MB)
15/06/01 15:01:55 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on sslab04.cs.purdue.edu:43152 (size: 4.8 KB, free: 265.1 MB)
15/06/01 15:01:56 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, sslab04.cs.purdue.edu): java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.spark.serializer.SerializationDebugger$ObjectStreamClassMethods$.getObjFieldValues$extension(SerializationDebugger.scala:240)
    at org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visitSerializable(SerializationDebugger.scala:150)
    at org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visit(SerializationDebugger.scala:99)
    at org.apache.spark.serializer.SerializationDebugger$.find(SerializationDebugger.scala:58)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:39)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
    at java.io.ObjectStreamClass$FieldReflector.getObjFieldValues(ObjectStreamClass.java:2050)
    at java.io.ObjectStreamClass.getObjFieldValues(ObjectStreamClass.java:1252)
    ... 15 more

15/06/01 15:01:56 INFO TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) on executor sslab04.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 1]
15/06/01 15:01:56 INFO TaskSetManager: Starting task 3.1 in stage 0.0 (TID 4, sslab04.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:56 INFO TaskSetManager: Starting task 2.1 in stage 0.0 (TID 5, sslab04.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:56 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor sslab04.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 2]
15/06/01 15:01:56 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor sslab04.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 3]
15/06/01 15:01:56 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 6, sslab05.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:56 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 7, sslab02.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:56 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on sslab02.cs.purdue.edu:55873 (size: 2.5 KB, free: 265.1 MB)
15/06/01 15:01:56 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on sslab05.cs.purdue.edu:57248 (size: 2.5 KB, free: 265.1 MB)
15/06/01 15:01:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on sslab02.cs.purdue.edu:55873 (size: 4.8 KB, free: 265.1 MB)
15/06/01 15:01:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on sslab05.cs.purdue.edu:57248 (size: 4.8 KB, free: 265.1 MB)
15/06/01 15:01:57 INFO TaskSetManager: Lost task 3.1 in stage 0.0 (TID 4) on executor sslab04.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 4]
15/06/01 15:01:57 INFO TaskSetManager: Starting task 3.2 in stage 0.0 (TID 8, sslab05.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:57 INFO TaskSetManager: Lost task 2.1 in stage 0.0 (TID 5) on executor sslab04.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 5]
15/06/01 15:01:57 INFO TaskSetManager: Starting task 2.2 in stage 0.0 (TID 9, sslab03.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:57 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on sslab03.cs.purdue.edu:43948 (size: 2.5 KB, free: 265.1 MB)
15/06/01 15:01:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on sslab03.cs.purdue.edu:43948 (size: 4.8 KB, free: 265.1 MB)
15/06/01 15:01:57 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 7) on executor sslab02.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 6]
15/06/01 15:01:57 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 10, sslab04.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:57 INFO TaskSetManager: Lost task 3.2 in stage 0.0 (TID 8) on executor sslab05.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 7]
15/06/01 15:01:57 INFO TaskSetManager: Starting task 3.3 in stage 0.0 (TID 11, sslab04.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:57 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 6) on executor sslab05.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 8]
15/06/01 15:01:57 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 12, sslab04.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:58 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 10) on executor sslab04.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 9]
15/06/01 15:01:58 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 13, sslab03.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:58 INFO TaskSetManager: Lost task 2.2 in stage 0.0 (TID 9) on executor sslab03.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 10]
15/06/01 15:01:58 INFO TaskSetManager: Starting task 2.3 in stage 0.0 (TID 14, sslab03.cs.purdue.edu, PROCESS_LOCAL, 1358 bytes)
15/06/01 15:01:58 INFO TaskSetManager: Lost task 3.3 in stage 0.0 (TID 11) on executor sslab04.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 11]
15/06/01 15:01:58 ERROR TaskSetManager: Task 3 in stage 0.0 failed 4 times; aborting job
15/06/01 15:01:58 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 12) on executor sslab04.cs.purdue.edu: java.lang.reflect.InvocationTargetException (null) [duplicate 12]
15/06/01 15:01:58 INFO TaskSchedulerImpl: Cancelling stage 0
15/06/01 15:01:58 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/06/01 15:01:58 INFO DAGScheduler: Job 0 failed: reduce at SimpleApp.java:51, took 5.941191 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 11, sslab04.cs.purdue.edu): java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.spark.serializer.SerializationDebugger$ObjectStreamClassMethods$.getObjFieldValues$extension(SerializationDebugger.scala:240)
    at org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visitSerializable(SerializationDebugger.scala:150)
    at org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visit(SerializationDebugger.scala:99)
    at org.apache.spark.serializer.SerializationDebugger$.find(SerializationDebugger.scala:58)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:39)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
    at java.io.ObjectStreamClass$FieldReflector.getObjFieldValues(ObjectStreamClass.java:2050)
    at java.io.ObjectStreamClass.getObjFieldValues(ObjectStreamClass.java:1252)
    ... 15 more

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

I'm not sure what I'm doing wrong. 我不确定自己在做什么错。 I would appreciate any help. 我将不胜感激任何帮助。 Thanks! 谢谢!

Your class Inc should be marked as serilizable. 您的班级Inc应该标记为可serilizable。 It seems the the Serilization debugger is trying to help and failing, and in the process masking the serialization error. 似乎Serilization调试器正在尝试帮助和失败,并在此过程中掩盖了序列化错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM