[英]Spark submit job fails for cluster mode but works in local for copyToLocal from HDFS in java
I'm running a Java code to copy the files from HDFS to local using Spark cluster mode in spark submit. 我正在运行Java代码,以在Spark提交中使用Spark集群模式将文件从HDFS复制到本地。 The job runs fine with spark local but fails in cluster mode. 作业在本地运行时运行正常,但在群集模式下失败。 It throws a java.io.exeception: Target /mypath/ is a directory. 它抛出一个java.io.exeception:目标/ mypath /是一个目录。
I don't understand why is it failing in cluster. 我不明白为什么它在群集中失败。 But I don't recieve any exceptions in local. 但是我在本地没有收到任何例外。
That behaviour is because in the first case (local) your driver is in the same machine that you are running the whole Spark job. 这是因为在第一种情况下(本地),您的驱动程序与运行整个Spark作业的计算机位于同一台计算机上。 In the second case (cluster), your driver program is shipped to one of your workers and execute the process from there. 在第二种情况下(集群),驱动程序被运送到您的一个工作人员并从那里执行该过程。
In general, when you want to run Spark jobs as a cluster mode and you need to pre-process local files such as JSON, XML, among others, you need to ship them along with the executable using the following sentence --files <myfile>
. 通常,当您要以集群模式运行Spark作业并且需要预处理本地文件(例如JSON,XML等)时,需要使用以下语句将它们与可执行文件一起--files <myfile>
。 Then in your driver program you will be able to see that particular file. 然后,在驱动程序中,您将可以看到该特定文件。 If you want to include multiple files, put them separated by comma (,)
. 如果要包括多个文件,请将它们用逗号(,)
分隔。
The approach is the same when you want to add some jars dependencies, you need to use --jars <myJars>
. 当您要添加一些jar依赖项时,方法是相同的,您需要使用--jars <myJars>
。
For more details about this, check this thread . 有关此的更多详细信息,请检查此线程 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.