简体   繁体   中英

spark-submit in deploy mode client not reading all the jars

I'm trying to submit an application to my spark cluster (standalone mode) through the spark-submit command. I'm following the official spark documentation , as well as relying on this other one . Now the problem is that I get strange behaviors. My setup is the following:

  • I have a directory where all the dependency jars for my application are located, that is /home/myuser/jars
  • The jar of my application is in the same directory ( /home/myuser/jars ), and is called dat-test.jar
  • The entry point class in dat-test.jar is at the package path my.package.path.Test
  • Spark master is at spark://master:7077

Now, I submit the application directly on the master node, thus using the client deploy mode, running the command

./spark-submit --class my.package.path.Test --master spark://master:7077 --executor-memory 5G --total-executor-cores 10 /home/myuser/jars/*

and I received an error as

java.lang.ClassNotFoundException: my.package.path.Test

If I activate the verbose mode, what I see is that the primaryResource selected as jar containing the entry point is the first jar by alphabetical order in /home/myuser/jars/ (that is not dat-test.jar ), leading (I supppose) to the ClassNotFoundException . All the jars in the same directory are anyway loaded as arguments.

Of course if I run

./spark-submit --class my.package.path.Test --master spark://master:7077 --executor-memory 5G --total-executor-cores 10 /home/myuser/jars/dat-test.jar

it finds the Test class, but it doesn't find other classes contained in other jars. Finally, if I use the --jars flag and run

./spark-submit --class my.package.path.Test --master spark://master:7077 --executor-memory 5G --total-executor-cores 10 --jars /home/myuser/jars/* /home/myuser/jars/dat-test.jar

I obtain the same result as the first option. First jar in /home/myuser/jars/ is loaded as primaryResource , leading to ClassNotFoundException for my.package.path.Test . Same if I add --jars /home/myuser/jars/*.jar .

Important points are:

  • I do not want to have a single jar with all the dependencies for development reasons
  • The jars in /home/myuser/jars/ are many. I'd like to know if there's a way to include them all instead of using the comma separated syntax
  • If I try to run the same commands with --deploy-cluster on the master node, I don't get the error, but the computation fails for some other reasons (but this is another problem).

Which is then the correct way of running a spark-submit in client mode? Thanks

There is no way to include all jars using the --jars option, you will have to create a small script to enumerate them. This part is a bit sub-optimal.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM