[英]Difference in running a spark application with sbt run or with spark-submit script
[英]Different behaviour running Scala in Spark on Yarn mode between SBT and spark-submit
我目前正在尝试对Cloudera集群在yarn(-client)模式下以Apache Spark执行一些Scala代码,但是sbt运行执行由于以下Java异常而中止:
[error] (run-main-0) org.apache.spark.SparkException: YARN mode not available ?
org.apache.spark.SparkException: YARN mode not available ?
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1267)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:199)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:100)
at SimpleApp$.main(SimpleApp.scala:7)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.YarnClientClusterScheduler
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1261)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:199)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:100)
at SimpleApp$.main(SimpleApp.scala:7)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
[trace] Stack trace suppressed: run last compile:run for the full output.
java.lang.RuntimeException: Nonzero exit code: 1
at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code: 1
15/11/24 17:18:03 INFO network.ConnectionManager: Selector thread was interrupted!
[error] Total time: 38 s, completed 24-nov-2015 17:18:04
我想预先构建的Apache Spark发行版是用yarn支持构建的,因为如果我尝试执行spark-submit(yarn-client)模式,将不再有任何Java异常,但是当我得到相同的条件时,yarn似乎并没有分配任何资源每秒发送一条消息:INFO客户端:application_1448366262851_0022的应用程序报告(状态:已接受)。 我想是因为配置问题。
我用谷歌搜索了最后一条消息,但我不明白我必须修改什么纱线(也不在哪里)配置才能执行程序,使纱线上有火花。
内容:
Scala测试程序:
更新
好吧,SBT作业失败了,因为当SBT打包并执行时,hadoop-client.jar和spark-yarn.jar不在类路径中。
现在,sbt run正在使用我的build.sbt配置如下,要求环境变量SPARK_YARN_APP_JAR和SPARK_JAR:
name := "File Searcher"
version := "1.0"
scalaVersion := "2.10.4"
librearyDependencies += "org.apache.spark" %% "spark-core" % "0.9.1"
libraryDependencies += "org.apache.spark" %% "spark-yarn" % "0.9.1" % "runtime"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0" % "runtime"
libraryDependencies += "org.apache.hadoop" % "hadoop-yarn-client" % "2.6.0" % "runtime"
resolvers += "Maven Central" at "https://repo1.maven.org/maven2"
有什么方法可以“自动”配置这些变量? 我的意思是,我可以设置SPARK_JAR,因为该jar随Spark安装一起提供,但是SPARK_YARN_APP_JAR? 当我手动设置这些变量时,我注意到即使设置了YARN_CONF_DIR变量,火花电机也不会考虑我的自定义配置。 有没有办法告诉SBT使用我的本地Spark配置工作?
如果有帮助,我让我正在执行的当前(难看)代码:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object SimpleApp {
def main(args: Array[String]) {
val logFile = "src/data/sample.txt"
val sc = new SparkContext("yarn-client", "Simple App", "C:/spark/lib/spark-assembly-1.3.0-hadoop2.4.0.jar",
List("target/scala-2.10/file-searcher_2.10-1.0.jar"))
val logData = sc.textFile(logFile, 2).cache()
val numTHEs = logData.filter(line => line.contains("the")).count()
println("Lines with the: %s".format(numTHEs))
}
}
谢谢!
谢谢Cheloute
好吧,我终于找到了我的问题。
而已。 其他一切都应该起作用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.