[英]Running Spark sbt project without sbt?
I have a Spark project which I can run from sbt console. 我有一个Spark项目,我可以从sbt控制台运行。 However, when I try to run it from the command line, I get Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext .
但是,当我尝试从命令行运行它时,我在线程“main”java.lang.NoClassDefFoundError:org / apache / spark / SparkContext中得到Exception 。 This is expected, because the Spark libs are listed as provided in the build.sbt .
这是预期的,因为Spark库是按build.sbt中的 提供列出的。
How do I configure things so that I can run the JAR from the command line, without having to use sbt console? 如何配置东西,以便我可以从命令行运行JAR,而不必使用sbt控制台?
To run Spark stand-alone you need to build a Spark assembly. 要独立运行Spark,您需要构建Spark程序集。 Run
sbt/sbt assembly
on the spark root dir. 在spark root目录上运行
sbt/sbt assembly
。 This will create: assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
这将创建:
assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
Then you build your job jar with dependencies (either with sbt assembly or maven-shade-plugin) 然后你用依赖项构建你的作业jar(使用sbt assembly或maven-shade-plugin)
You can use the resulting binaries to run your spark job from the command line: 您可以使用生成的二进制文件从命令行运行spark作业:
ADD_JARS=job-jar-with-dependencies.jar SPARK_LOCAL_IP=<IP> java -cp spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar:job-jar-with-dependencies.jar com.example.jobs.SparkJob
Note: If you need other HDFS version, you need to follow additional steps before building the assembly. 注意:如果您需要其他HDFS版本,则需要在构建装配之前执行其他步骤。 See About Hadoop Versions
请参阅关于Hadoop版本
Using sbt assembly plugin
we can create a single jar. 使用
sbt assembly plugin
我们可以创建一个jar。 After doing that you can simply run it using java -jar
command 完成后,您可以使用
java -jar
命令运行它
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.