简体   繁体   English

如何在 Oozie spark scala 作业中将文件从本地复制到 HDFS 目录?

[英]How to copy file from local to HDFS directory in Oozie spark scala job?

I am trying to copy some files from local path to hdfs with scala, and running it with oozie.我正在尝试使用 scala 将一些文件从本地路径复制到 hdfs,并使用 oozie 运行它。 The job is failing as it is not able to read files from local path.作业失败,因为它无法从本地路径读取文件。 Is there a way to read local files in oozie?有没有办法在oozie中读取本地文件?

it is not possible to copy/read local files by spark if it is running in cluster mode.如果在集群模式下运行,则无法通过 spark 复制/读取本地文件。 Reason is, When Oozie submits Spark job in cluster mode, it is not necessary that YARN will allocate the same node (local node) as an executor.原因是,当 Oozie 以集群模式提交 Spark 作业时,YARN 没有必要分配与执行器相同的节点(本地节点)。 Suppose if you have limited executors and it allocates the same node then also not possible to access the same file by all other executors.假设您的执行程序有限并且它分配了相同的节点,那么所有其他执行程序也无法访问相同的文件。

Then only possible solution I see is to keep all local files in the share directory which will accessible by all cluster nodes after it you can use the below commands to fire hdfs command using scala.那么我看到的唯一可能的解决方案是将所有本地文件保存在共享目录中,所有集群节点都可以访问这些文件,之后您可以使用以下命令使用 scala 来触发hdfs命令。

import org.apache.hadoop.fs
import org.apache.hadoop.fs._
val conf = new Configuration()

val fs = path.getFileSystem(conf)

val hdfspath = new Path("hdfs:///user/nikhil/test.csv")
val localpath = new Path("file:///home/cloudera/test/")

fs.copyToLocalFile(hdfspath,localpath)

Please find below link to get help in creating of Share directory just for reference.请找到以下链接以获得创建共享目录的帮助,仅供参考。

https://www.tecmint.com/how-to-setup-nfs-server-in-linux/ https://www.tecmint.com/how-to-setup-nfs-server-in-linux/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM