简体   繁体   English

Spark提交作业在集群模式下失败,但在Java中可用于本地HDFS中的copyToLocal

[英]Spark submit job fails for cluster mode but works in local for copyToLocal from HDFS in java

I'm running a Java code to copy the files from HDFS to local using Spark cluster mode in spark submit. 我正在运行Java代码,以在Spark提交中使用Spark集群模式将文件从HDFS复制到本地。 The job runs fine with spark local but fails in cluster mode. 作业在本地运行时运行正常,但在群集模式下失败。 It throws a java.io.exeception: Target /mypath/ is a directory. 它抛出一个java.io.exeception:目标/ mypath /是一个目录。

I don't understand why is it failing in cluster. 我不明白为什么它在群集中失败。 But I don't recieve any exceptions in local. 但是我在本地没有收到任何例外。

That behaviour is because in the first case (local) your driver is in the same machine that you are running the whole Spark job. 这是因为在第一种情况下(本地),您的驱动程序与运行整个Spark作业的计算机位于同一台计算机上。 In the second case (cluster), your driver program is shipped to one of your workers and execute the process from there. 在第二种情况下(集群),驱动程序被运送到您的一个工作人员并从那里执行该过程。

In general, when you want to run Spark jobs as a cluster mode and you need to pre-process local files such as JSON, XML, among others, you need to ship them along with the executable using the following sentence --files <myfile> . 通常,当您要以集群模式运行Spark作业并且需要预处理本地文件(例如JSON,XML等)时,需要使用以下语句将它们与可执行文件一起--files <myfile> Then in your driver program you will be able to see that particular file. 然后,在驱动程序中,您将可以看到该特定文件。 If you want to include multiple files, put them separated by comma (,) . 如果要包括多个文件,请将它们用逗号(,)分隔。

The approach is the same when you want to add some jars dependencies, you need to use --jars <myJars> . 当您要添加一些jar依赖项时,方法是相同的,您需要使用--jars <myJars>

For more details about this, check this thread . 有关此的更多详细信息,请检查此线程

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在不使用spark-submit的情况下将java程序中的spark作业提交到独立的spark集群? - How to submit spark job from within java program to standalone spark cluster without using spark-submit? 在HDFS上重命名文件在本地模式下工作,但不在群集模式下工作 - Renaming a file on HDFS works in local mode but not in cluster mode Spark 在 Yarn Cluster 模式下提交,并将配置文件放入 HDFS 问题 - Spark submit on Yarn Cluster mode with config file put into HDFS issue 本地 microK8 的 Kubernetes 集群上的 spark-submit 失败:java.security.cert.CertPathValidatorException - spark-submit on local microK8's Kubernetes cluster fails with: java.security.cert.CertPathValidatorException 从本地 spark-submit 检查远程 HDFS 上是否存在文件 - Check if file exists on remote HDFS from local spark-submit 从 Java 提交 Azure 突触中的 Spark 作业 - Submit Spark job in Azure Synapse from Java Unable to submit a spark job on spark cluster on docker - Unable to submit a spark job on spark cluster on docker 在集群模式下使用 Java 读取保存在本地的 Spark 中的 CSV 文件 - Read CSV file in Spark kept in local using Java in cluster mode 使用Java和Spark从本地图像到HDFS写入序列文件 - Writing a sequence file from an image in local to HDFS using Java and Spark 无法从 Java 应用程序连接到本地 Spark 集群 - Cannot connect to local spark cluster from Java application
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM