[英]Copy a directory with content from HDFS to local filesystem
I'm looking for a best way to copy whole directory from HDFS with all contents inside. 我正在寻找一种从HDFS复制整个目录并包含所有内容的最佳方法。 Something like:
就像是:
Path srcPath = new Path("hdfs://localhost:9000/user/britva/data");
Path dstPath = new Path("/home/britva/Work");
fs.copyToLocal(false, srcPath, dstPath);
Additionally, "data" folder can contain folders which aren't present in the "Work" directory. 此外,“数据”文件夹可以包含“工作”目录中不存在的文件夹。 So what is the best way of doing this?
那么最好的方法是什么?
Thanks for your answers! 感谢您的回答!
I suppose one of the solutions is to use FileUtil object, but not sure how to use it, as I have initialized only one fileSystem - HDFS. 我想解决方案之一是使用FileUtil对象,但不确定如何使用它,因为我仅初始化了一个fileSystem-HDFS。 Then the question is how should I initialize my local FS?
然后问题是我应该如何初始化本地FS? As I understand this util is used when you have many nodes.
据我了解,当您有许多节点时会使用此util。 But what I want - to work with local FS - to copy from HDFS to project sources.
但是我想要的-与本地FS一起使用-从HDFS复制到项目源。
Also, as I'm using Play! 另外,因为我正在使用Play! framework, would be great to use it's path, like
Play.application.path + "/public/stuff"
. 框架,最好使用它的路径,例如
Play.application.path + "/public/stuff"
。
And if I'm trying to use the code above, it says: 如果我尝试使用上面的代码,它会说:
java.io.IOException: No FileSystem for scheme: file
I use scala, so here is scala example which is similar to java. 我使用scala,所以这里是类似于java的scala示例。
Step1. 第1步。 make sure your hdfs is active.
确保您的hdfs处于活动状态。 for local, just try to open 127.0.0.1:50070
对于本地,只需尝试打开127.0.0.1:50070
Step2. 第2步。 here is scala code:
这是scala代码:
val hdfsConfig = new Configuration
val hdfsURI = "127.0.0.1:9000"
val hdfs = FileSystem.get(new URI(hdfsURI), hdfsConfig)
val targetPath = new Path("127.0.0.1:9000/hdfsData")
if (hdfs.exists(targetPath)) {
hdfs.delete(targetPath, true)
}
val oriPath = new Path(#your_local_file_path)
hdfs.copyFromLocalFile(oriPath, new Path(hdfsURI+"/"))
hdfs.close()
Step3. 第三步 for example: my local file path is : /tmp/hdfsData
例如:我的本地文件路径是:/ tmp / hdfsData
I want to copy all files under this directory, after run Step2's code, in HDFS: all files will be on "127.0.0.1:9000/hdfsData/" 在HDFS中运行Step2的代码后,我想复制此目录下的所有文件:所有文件都将位于“ 127.0.0.1:9000/hdfsData/”上
Step4. 第四步。 for copying from HDFS to local, just change "copyFromLocalFile" to "copyToLocalFile"
从HDFS复制到本地,只需将“ copyFromLocalFile”更改为“ copyToLocalFile”
If you build your project using maven
regarding to 'No FileSystem for scheme' exception I had issue like this and my case was the following: 如果您使用有关“ No FileSystem for scheme”异常的
maven
来构建项目,则我将遇到这样的问题,我的情况如下:
Please check content of the JAR you're trying to run. 请检查您要运行的JAR的内容。 Especially
META-INFO/services
directory, file org.apache.hadoop.fs.FileSystem
. 特别是
META-INFO/services
目录,文件org.apache.hadoop.fs.FileSystem
。 There should be list of filsystem implementation classes. 应该有filsystem实现类的列表。 Check line
org.apache.hadoop.hdfs.DistributedFileSystem
is present in the list for HDFS and org.apache.hadoop.fs.LocalFileSystem
for local file scheme. 检查HDFS列表中是否存在
org.apache.hadoop.hdfs.DistributedFileSystem
行,以及本地文件方案的org.apache.hadoop.fs.LocalFileSystem
。
If this is the case, you have to override referred resource during the build. 在这种情况下,您必须在构建过程中覆盖引用的资源。
Other possibility is you simply don't have hadoop-hdfs.jar
in your classpath but this has low probability. 另一种可能性是,您只是在类路径中没有
hadoop-hdfs.jar
,但是可能性很小。 Usually if you have correct hadoop-client
dependency it is not an option. 通常,如果您具有正确的
hadoop-client
依赖关系,则不是一个选择。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.