简体   繁体   中英

Spark submit job fails for cluster mode but works in local for copyToLocal from HDFS in java

I'm running a Java code to copy the files from HDFS to local using Spark cluster mode in spark submit. The job runs fine with spark local but fails in cluster mode. It throws a java.io.exeception: Target /mypath/ is a directory.

I don't understand why is it failing in cluster. But I don't recieve any exceptions in local.

That behaviour is because in the first case (local) your driver is in the same machine that you are running the whole Spark job. In the second case (cluster), your driver program is shipped to one of your workers and execute the process from there.

In general, when you want to run Spark jobs as a cluster mode and you need to pre-process local files such as JSON, XML, among others, you need to ship them along with the executable using the following sentence --files <myfile> . Then in your driver program you will be able to see that particular file. If you want to include multiple files, put them separated by comma (,) .

The approach is the same when you want to add some jars dependencies, you need to use --jars <myJars> .

For more details about this, check this thread .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM