I'm new to Hortonworks VM and I got confused. I'm trying to run a .jar file on Spark. Normally I test locally on Windows by running
spark-submit --driver-memory 4g --class en.name.ClassName %CODE%/target/program.jar
but since I need Hive, I thought I'd move to a Hortonworks VM to test locally. Now, I've uploaded my .jar and the input files to the HDFS (to the /tmp/my_code
directory) via Hortonworks' Ambari's HDFS Files GUI. What next? I also found the command line, but how do I access my .jar on the HDFS from the VM's command line? I'm trying to run
spark-submit --driver-memory 4g --class en.name.ClassName /tmp/my_code/program.jar
from the sandbox condole (the one running on http://127.0.0.1:4200/
by default, root@sandbox
"Shell in a Box"), which is not working. It says the .jar does not exists. How can I point the VM to use the .jar on HDFS? Thank you!
JAR should be on local file system NOT in hdfs
. Only input file should be in hdfs
. so /tmp/my_code/program.jar
path should be local, and that why you are seeing that error: the .jar does not exists
if you run-this command:
>spark-submit --help
you will see:
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
update: Accroding to Documentations :
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
so,
if jar is on hdfs:
spark-submit --driver-memory 4g --class en.name.ClassName hdfs://target/program.jar
if jar is on local:
spark-submit --driver-memory 4g --class en.name.ClassName /target/program.jar
OR
spark-submit --driver-memory 4g --class en.name.ClassName file://target/program.jar
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.