简体   繁体   English

如何在Hortonworks VM上的Spark上运行.jar?

[英]How to run .jar on Spark on Hortonworks VM?

I'm new to Hortonworks VM and I got confused. 我是Hortonworks VM的新手,我感到困惑。 I'm trying to run a .jar file on Spark. 我正在尝试在Spark上运行.jar文件。 Normally I test locally on Windows by running 通常我通过运行在Windows上进行本地测试

spark-submit --driver-memory 4g --class en.name.ClassName %CODE%/target/program.jar

but since I need Hive, I thought I'd move to a Hortonworks VM to test locally. 但是由于我需要Hive,所以我认为我应该转到Hortonworks VM进行本地测试。 Now, I've uploaded my .jar and the input files to the HDFS (to the /tmp/my_code directory) via Hortonworks' Ambari's HDFS Files GUI. 现在,我已经通过Hortonworks的Ambari的HDFS文件GUI将我的.jar和输入文件上传到HDFS(到/tmp/my_code目录)。 What next? 接下来是什么? I also found the command line, but how do I access my .jar on the HDFS from the VM's command line? 我也找到了命令行,但是如何从VM的命令行访问HDFS上的.jar? I'm trying to run 我正在尝试跑步

spark-submit --driver-memory 4g --class en.name.ClassName /tmp/my_code/program.jar from the sandbox condole (the one running on http://127.0.0.1:4200/ by default, root@sandbox "Shell in a Box"), which is not working. spark-submit --driver-memory 4g --class en.name.ClassName /tmp/my_code/program.jar来自沙盒spark-submit --driver-memory 4g --class en.name.ClassName /tmp/my_code/program.jar (默认情况下运行在http://127.0.0.1:4200/root@sandbox “盒子里的贝壳”),它不起作用。 It says the .jar does not exists. 它说.jar不存在。 How can I point the VM to use the .jar on HDFS? 如何指向VM在HDFS上使用.jar? Thank you! 谢谢!

JAR should be on local file system NOT in hdfs . JAR应该在本地文件系统上, hdfshdfs Only input file should be in hdfs . 仅输入文件应位于hdfs so /tmp/my_code/program.jar path should be local, and that why you are seeing that error: the .jar does not exists 因此/tmp/my_code/program.jar路径应该是本地路径,这就是为什么您看到该错误的原因: the .jar does not exists

if you run-this command: 如果运行此命令:

>spark-submit --help

you will see: 你会看见:

 --jars JARS   Comma-separated list of local jars to include on the driver
               and executor classpaths.

update: Accroding to Documentations : 更新:积累文档

application-jar: Path to a bundled jar including your application and all dependencies. application-jar:包含您的应用程序和所有依赖项的捆绑jar的路径。 The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes. 该URL必须在群集内部全局可见,例如,所有节点上都存在hdfs://路径或file://路径。

so, 所以,

if jar is on hdfs: 如果jar在hdfs上:

spark-submit --driver-memory 4g --class en.name.ClassName hdfs://target/program.jar

if jar is on local: 如果jar位于本地:

spark-submit --driver-memory 4g --class en.name.ClassName /target/program.jar

OR 要么

spark-submit --driver-memory 4g --class en.name.ClassName file://target/program.jar

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM