繁体   English   中英

在 spark-submit 和 spark-shell 之间引发不同的行为

[英]Spark different behavior between spark-submit and spark-shell

单独使用 Spark 1.3.1(在 ubuntu 14.04 上),sbt 0.13.10,并尝试执行以下脚本:

package co.some.sheker
import java.sql.Date
import org.apache.spark.{SparkContext, SparkConf}
import SparkContext._
import org.apache.spark.sql.{Row, SQLContext}
import com.datastax.spark.connector._
import java.sql._
import org.apache.spark.sql._
import org.apache.spark.sql.cassandra.CassandraSQLContext
import java.io.PushbackReader
import java.lang.{ StringBuilder => JavaStringBuilder }
import java.io.StringReader
import com.datastax.spark.connector.cql.CassandraConnector
import org.joda.time.{DateTimeConstants}

case class TableKey(key1: String, key2: String)

object myclass{
  def main(args: scala.Array[String]) {
    val conf = ...
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    val csc = new CassandraSQLContext(sc)
    val data_x = csc.sql("select distinct key1, key2 from keyspace.table where key1 = 'sheker'").map(row => (row(0).toString, row(1).toString))
    println("Done cross mapping")
    val snapshotsFiltered = data_x.map(x => TableKey(x._1,x._2)).joinWithCassandraTable("keyspace", "table")
    println("Done join")
    val jsons = snapshotsFiltered.map(_._2.getString("json"))
 ...

    sc.stop()
    println("Done.")
}
}

通过使用:

/home/user/spark-1.3.1/bin/spark-submit --master spark://1.1.1.1:7077 --driver-class-path /home/user/spark-cassandra-connector-java-assembly-1.3.1-FAT.jar --properties-file prop.conf --class "myclass" "myjar.jar"

prop.conf文件是:

spark.cassandra.connection.host myhost
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.eventLog.enabled true
spark.eventLog.dir /var/tmp/eventLog
spark.executor.extraClassPath /home/ubuntu/spark-cassandra-connector-java-assembly-1.3.1-FAT.jar

我得到这个例外

Done cross mapping
Exception in thread "main" java.lang.NoSuchMethodError: com.datastax.spark.connector.mapper.ColumnMapper$.defaultColumnMapper(Lscala/reflect/ClassTag;Lscala/reflect/api/TypeTags$TypeTag;)Lcom/datastax/spark/connector/mapper/ColumnMapper;
    at co.crowdx.aggregation.CassandraToElasticTransformater$.main(CassandraToElasticTransformater.scala:79)
    at co.crowdx.aggregation.CassandraToElasticTransformater.main(CassandraToElasticTransformater.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Done Sending Signal aggregation job to Spark

奇怪的是,当我尝试从脚本运行命令时 - 在 shell 中它工作正常 我正在使用:

/home/user/spark-1.3.1/bin/spark-shell --master spark://1.1.1.1:7077  --driver-class-path /home/ubuntu/spark-cassandra-connector-java-assembly-1.3.1-FAT.jar --properties-file prop.conf

Build.scala文件是:

import sbt._
import Keys._
import sbtassembly.Plugin._
import AssemblyKeys._

object AggregationsBuild extends Build {
  lazy val buildSettings = Defaults.defaultSettings ++ Seq(
version := "1.0.0",
organization := "co.sheker",
scalaVersion := "2.10.4"
 )

 lazy val app = Project(
"geo-aggregations",
file("."),
settings = buildSettings ++ assemblySettings ++ Seq(
  parallelExecution in Test := false,
  libraryDependencies ++= Seq(
    "com.datastax.spark" %% "spark-cassandra-connector" % "1.2.1",
    // spark will already be on classpath when using spark-submit.
    // marked as provided, so that it isn't included in assembly.
    "org.apache.spark" %% "spark-core" % "1.2.1" % "provided",
    "org.apache.spark" %% "spark-catalyst" % "1.2.1" % "provided",
    "org.apache.spark" %% "spark-sql" % "1.2.1" % "provided",
    "org.scalatest" %% "scalatest" % "2.1.5" % "test",
    "org.postgresql" % "postgresql" % "9.4-1201-jdbc41",
"com.github.nscala-time" %% "nscala-time" % "2.4.0",
"org.elasticsearch" % "elasticsearch-hadoop" % "2.2.0" % "provided"
  ),
  resolvers += "conjars.org" at "http://conjars.org/repo",
resolvers += "clojars" at "https://clojars.org/repo"
)
 )
}

怎么了? 为什么它在提交时失败而不是在 shell 中?

您说您使用的是 spark 1.3,但您的构建包含 spark 1.2.1 依赖项。

就像我在评论中所说的那样,我相信您的 Spark 驱动程序的版本与您的应用程序中的版本不同,这会导致您遇到错误。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM