简体   繁体   English

没有sbt运行Spark sbt项目?

[英]Running Spark sbt project without sbt?

I have a Spark project which I can run from sbt console. 我有一个Spark项目,我可以从sbt控制台运行。 However, when I try to run it from the command line, I get Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext . 但是,当我尝试从命令行运行它时,我在线程“main”java.lang.NoClassDefFoundError:org / apache / spark / SparkContext中得到Exception This is expected, because the Spark libs are listed as provided in the build.sbt . 这是预期的,因为Spark库是按build.sbt中的 提供列出的。

How do I configure things so that I can run the JAR from the command line, without having to use sbt console? 如何配置东西,以便我可以从命令行运行JAR,而不必使用sbt控制台?

To run Spark stand-alone you need to build a Spark assembly. 要独立运行Spark,您需要构建Spark程序集。 Run sbt/sbt assembly on the spark root dir. 在spark root目录上运行sbt/sbt assembly This will create: assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar 这将创建: assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar

Then you build your job jar with dependencies (either with sbt assembly or maven-shade-plugin) 然后你用依赖项构建你的作业jar(使用sbt assembly或maven-shade-plugin)

You can use the resulting binaries to run your spark job from the command line: 您可以使用生成的二进制文件从命令行运行spark作业:

ADD_JARS=job-jar-with-dependencies.jar SPARK_LOCAL_IP=<IP> java -cp spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar:job-jar-with-dependencies.jar com.example.jobs.SparkJob

Note: If you need other HDFS version, you need to follow additional steps before building the assembly. 注意:如果您需要其他HDFS版本,则需要在构建装配之前执行其他步骤。 See About Hadoop Versions 请参阅关于Hadoop版本

Using sbt assembly plugin we can create a single jar. 使用sbt assembly plugin我们可以创建一个jar。 After doing that you can simply run it using java -jar command 完成后,您可以使用java -jar命令运行它

For more details refer 有关详细信息,请参阅

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM