简体   繁体   English

在现有EMR上运行Scala Spark作业

[英]Running Scala Spark Jobs on Existing EMR

I have Spark Job aggregationfinal_2.11-0.1 jar which I am running on my machine.the composition of it is as follows : 我的机器上运行的是Spark Job Aggregationfinal_2.11-0.1 jar,它的组成如下:

package deploy
    object FinalJob {
      def main(args: Array[String]): Unit = {
        val spark = SparkSession
          .builder()
          .appName(s"${this.getClass.getSimpleName}")
          .config("spark.sql.shuffle.partitions", "4")
          .getOrCreate()

    //continued code
    }
    }

When I am running this code in local mode, it is running fine but when I am deploying this on the EMR cluster with putting its jar in main node.It is giving error as : 当我在本地模式下运行此代码时,它运行良好,但是当我将其jar放在主节点上时将其部署在EMR群集上时,出现错误为:

ClassNotFoundException : deploy.FinalJob

What am i missing here? 我在这里想念什么?

The best option is to deploy your uber jar(you can use sbt assembly plugin to build jar) to s3 and add spark step to EMR cluster. 最好的选择是将uber jar(可以使用sbt assembly插件来构建jar)部署到s3并向EMR集群添加spark步骤。 Please check: http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html 请检查: http : //docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html

try to unjar it to some folder and look for target/classes using the below command jar -xvf myapp.jar. 尝试将其解压缩到某个文件夹,并使用以下命令jar -xvf myapp.jar查找目标/类。 If the target classes is not containing the class you are executing then there is an issue with the way you build your jar. 如果目标类不包含您正在执行的类,则构建jar的方式存在问题。 I would recommend maven assembly to be in your pom for packaging. 我建议将maven组装放入包装的pom中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM