简体   繁体   中英

create an RDD using Spark hadoop API to access Cassandra DB

I am running one node cassandra 2.0.3 and Apache Spark 2.0.3

I have created a scala program to create an RDD using Spark hadoop API to access Cassandra DB.

Also what environment variables should be set up for spaark in bashrc as I am using the below config in spark-env.sh

export SPARK_MASTER_IP="10.0.3.15"

export SPARK_MASTER_PORT="7077"

export SCALA_HOME="/home/Desktop/CD/scala-2.9.3"

export SPARK_WORKER_MEMORY=1g

export SPARK_WORKER_INSTANCES=1

export SPARK_WORKER_DIR="/home/Desktop/CD/spark-0.8.0-incubating/sparkdata"

My sample scala code is as below

            val job=new Job()
    job.setInputFormatClass(classOf[ColumnFamilyInputFormat])
    val host: String = "localhost"
            val port: String = "9160"
    ConfigHelper.setInputInitialAddress(job.getConfiguration(), host)
        ConfigHelper.setInputRpcPort(job.getConfiguration(), port)
    ConfigHelper.setInputColumnFamily(job.getConfiguration(), "demodb", "emp")
    ConfigHelper.setInputPartitioner(job.getConfiguration(), "Murmur3Partitioner")
    CqlConfigHelper.setInputColumns(job.getConfiguration(), "empid,deptid,first_name,last_name")
            CqlConfigHelper.setInputWhereClauses(job.getConfiguration(),"empid=104")

    // Make a new Hadoop RDD
    val casRdd = sc.newAPIHadoopRDD(job.getConfiguration(),
                            classOf[CqlPagingInputFormat],
                            classOf[Map[String, ByteBuffer]],
                            classOf[Map[String, ByteBuffer]])

    println(casRdd.count())

However when I run this job on Spark Master it does not complete the job and gives the below log.

14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException
java.lang.RuntimeException
    at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:661)
    at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:297)
    at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:163)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:86)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:74)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
    at org.apache.spark.scheduler.ResultTask.run(ResultTask.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:0 as TID 12 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:0 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 2 (task 0.0:2)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 1]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:2 as TID 13 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:2 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 1 (task 0.0:1)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 2]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:1 as TID 14 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1 as 2144 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 5 (task 0.0:5)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 3]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:5 as TID 15 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:5 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 11 (task 0.0:11)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 4]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:11 as TID 16 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:11 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 8 (task 0.0:8)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 5]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:8 as TID 17 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:8 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 3 (task 0.0:3)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 6]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:3 as TID 18 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:3 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 6 (task 0.0:6)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 7]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:6 as TID 19 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:6 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 9 (task 0.0:9)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 8]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:9 as TID 20 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:9 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 7 (task 0.0:7)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 9]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:7 as TID 21 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:7 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 10 (task 0.0:10)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 10]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:10 as TID 22 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:10 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 4 (task 0.0:4)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 11]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:4 as TID 23 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:4 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 12 (task 0.0:0)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 12]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:0 as TID 24 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:0 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 13 (task 0.0:2)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 13]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:2 as TID 25 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:2 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 14 (task 0.0:1)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 14]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:1 as TID 26 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 15 (task 0.0:5)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 15]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:5 as TID 27 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:5 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 16 (task 0.0:11)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 16]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:11 as TID 28 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:11 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 17 (task 0.0:8)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 17]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:8 as TID 29 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:8 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 18 (task 0.0:3)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 18]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:3 as TID 30 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:3 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 19 (task 0.0:6)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 19]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:6 as TID 31 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:6 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 20 (task 0.0:9)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 20]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:9 as TID 32 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:9 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 21 (task 0.0:7)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 21]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:7 as TID 33 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:7 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 22 (task 0.0:10)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 22]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:10 as TID 34 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:10 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 23 (task 0.0:4)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 23]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:4 as TID 35 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:4 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 24 (task 0.0:0)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 24]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:0 as TID 36 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:0 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 25 (task 0.0:2)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 25]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:2 as TID 37 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:2 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 26 (task 0.0:1)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 26]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:1 as TID 38 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 27 (task 0.0:5)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 27]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:5 as TID 39 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:5 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 29 (task 0.0:8)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 28]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:8 as TID 40 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:8 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 28 (task 0.0:11)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 29]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:11 as TID 41 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:11 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 30 (task 0.0:3)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 30]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:3 as TID 42 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:3 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 31 (task 0.0:6)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 31]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:6 as TID 43 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:6 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 32 (task 0.0:9)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 32]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:9 as TID 44 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:9 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 33 (task 0.0:7)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 33]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:7 as TID 45 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:7 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 34 (task 0.0:10)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 34]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:10 as TID 46 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:10 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 35 (task 0.0:4)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 35]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:4 as TID 47 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:4 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 36 (task 0.0:0)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 36]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:0 as TID 48 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:0 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 37 (task 0.0:2)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 37]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:2 as TID 49 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:2 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 38 (task 0.0:1)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 38]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:1 as TID 50 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:1 as 2144 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 39 (task 0.0:5)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 39]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:5 as TID 51 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:5 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 40 (task 0.0:8)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 40]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:8 as TID 52 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:8 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 41 (task 0.0:11)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 41]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:11 as TID 53 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:11 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 42 (task 0.0:3)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 42]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:3 as TID 54 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:3 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 43 (task 0.0:6)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 43]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:6 as TID 55 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:6 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 44 (task 0.0:9)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 44]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:9 as TID 56 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:9 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 45 (task 0.0:7)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 45]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:7 as TID 57 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:7 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 46 (task 0.0:10)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 46]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:10 as TID 58 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:10 as 2146 bytes in 1 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 47 (task 0.0:4)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 47]
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Starting task 0.0:4 as TID 59 on executor 0: 10.0.0.100 (PROCESS_LOCAL)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Serialized task 0.0:4 as 2146 bytes in 0 ms
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Lost TID 48 (task 0.0:0)
14/01/03 16:34:16 INFO cluster.ClusterTaskSetManager: Loss was due to java.lang.RuntimeException [duplicate 48]
14/01/03 16:34:16 ERROR cluster.ClusterTaskSetManager: Task 0.0:0 failed more than 4 times; aborting job
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Remove TaskSet 0.0 from pool 
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 51 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 49 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 50 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 52 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 53 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 54 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 52 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 51 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 55 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 53 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 56 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 54 because its task set is gone
14/01/03 16:34:16 INFO scheduler.DAGScheduler: Failed to run count at myown.scala:75
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 57 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 55 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 56 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 58 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 59 because its task set is gone
[error] (run-main) org.apache.spark.SparkException: Job failed: Task 0.0:0 failed more than 4 times
org.apache.spark.SparkException: Job failed: Task 0.0:0 failed more than 4 times
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
    at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
    at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 59 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 57 because its task set is gone
14/01/03 16:34:16 INFO cluster.ClusterScheduler: Ignoring update from TID 58 because its task set is gone

So basically I am confused and struggling to get past this issue as I do not understand if its a problem with my scala code or with the spark master-slave communication or with spark environment configuration.

Request to guide me on this.

This answer might help you with your implementation: Spark with Cassandra input/output

Datastax announced the official Cassandra driver for Spark. With this solution you will not need to implement the Hadoop interface since this is a direct bridge between Cassandra and Spark.

As you can see here this is caused because a generic unhandled exception is being thrown in the Cassandra's CQL RecordReader.

Try to run Spark with Debug logging enabled and you should get the error that is causing this issue.

My guess is the problem will be that you are trying to use setting ColumnFamilyInputFormat in jobConfiguration and passing classOf[CqlPagingInputFormat] to the newHadoopApiRdd method. But htat is just a guess.

Though we have currently only released Calliope built only against Cassandra 1.2.12, our client have used it with Cassandra 2.x successfully without any change. Calliope gives you a simpler and higher level API to work with Cassandra from Spark. I will suggest you give it a try.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM