简体   繁体   English

如何以编程方式运行Spark作业

[英]How can I run Spark job programmatically

I wan't to run Spark job programmatically - submit SparkPi calculation to remote cluster directly from Idea (my laptop): 我不会以编程方式运行Spark作业-直接从Idea(我的笔记本电脑)将SparkPi计算提交到远程集群:

object SparkPi {

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Spark Pi")
      .setMaster("spark://host-name:7077")
    val spark = new SparkContext(conf)
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = 100000 * slices
    val count = spark.parallelize(1 to n, slices).map { i =>
      val x = random * 2 - 1
      val y = random * 2 - 1
      if (x * x + y * y < 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
    spark.stop()
  }

}

However, when I run it, I observe the following error: 但是,当我运行它时,我发现以下错误:

14/12/08 11:31:20 ERROR security.UserGroupInformation: PriviledgedActionException as:remeniuk (auth:SIMPLE) cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1421)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    ... 4 more

When I run the same script with spark-submit from my laptop, I see the same error. 当我从笔记本电脑执行带有spark-submit的相同脚本时,会看到相同的错误。

And only when I upload the jar to remote cluster (machine, where master is running), job complete successfully: 而且只有当我将jar上传到远程集群(运行master的机器)时,作业才能成功完成:

./bin/spark-submit --master spark://host-name:7077 --class com.viaden.crm.spark.experiments.SparkPi ../spark-experiments_2.10-0.1-SNAPSHOT.jar

According to the exception stack, it should be your local firewall issue. 根据异常堆栈,这应该是您的本地防火墙问题。

please refer to this similar case Intermittent Timeout Exception using Spark 请参考使用Spark的类似情况间歇性超时异常

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将未编译的 Spark Scala/spark-shell 代码作为 Dataproc 作业运行? - How can I run uncompiled Spark Scala/spark-shell code as a Dataproc job? 如何在不运行的情况下获取Apache Spark作业的DAG? - How can I obtain the DAG of an Apache Spark job without running it? 如何在执行程序节点中以编程方式找到Spark版本? - How can I programmatically find Spark version in an executor node? 在运行于纱线中的scala spark作业中,如何使作业失败,以便纱线显示“失败”状态 - In a scala spark job, running in yarn, how can I fail the job so that yarn shows a Failed status 如何在Apache Spark中缓存可被其他Spark作业使用的数据 - how to cache data in apache spark that can be used by other spark job 如何将配置文件添加到在 YARN-CLUSTER 模式下运行的 Spark 作业? - How can I add configuration files to a Spark job running in YARN-CLUSTER mode? 如果过去 1 小时内没有推送记录,如何设置 Spark 流作业的警报? - How can I set alert of spark streaming job if no records are being pushed in last 1 hour? 我如何以编程方式知道我的 spark 程序是在本地还是集群模式下运行? - How can I know programmatically if my spark program is running in local or cluster mode? 如何注册我的自定义UDF,以便可以在spark-shell中运行它 - How do I register my custom UDF so I can run it in spark-shell 减少Apache spark job / application的运行时间 - Reducing run time of Apache spark job/application
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM