简体   繁体   中英

Copy a directory with content from HDFS to local filesystem

I'm looking for a best way to copy whole directory from HDFS with all contents inside. Something like:

Path srcPath = new Path("hdfs://localhost:9000/user/britva/data");
Path dstPath = new Path("/home/britva/Work");
fs.copyToLocal(false, srcPath, dstPath);

Additionally, "data" folder can contain folders which aren't present in the "Work" directory. So what is the best way of doing this?

Thanks for your answers!

I suppose one of the solutions is to use FileUtil object, but not sure how to use it, as I have initialized only one fileSystem - HDFS. Then the question is how should I initialize my local FS? As I understand this util is used when you have many nodes. But what I want - to work with local FS - to copy from HDFS to project sources.

Also, as I'm using Play! framework, would be great to use it's path, like Play.application.path + "/public/stuff" .

And if I'm trying to use the code above, it says:

java.io.IOException: No FileSystem for scheme: file

I use scala, so here is scala example which is similar to java.

Step1. make sure your hdfs is active. for local, just try to open 127.0.0.1:50070

Step2. here is scala code:

val hdfsConfig = new Configuration
val hdfsURI = "127.0.0.1:9000"
val hdfs = FileSystem.get(new URI(hdfsURI), hdfsConfig)
val targetPath = new Path("127.0.0.1:9000/hdfsData")
if (hdfs.exists(targetPath)) {
  hdfs.delete(targetPath, true)
}
val oriPath = new Path(#your_local_file_path)
hdfs.copyFromLocalFile(oriPath, new Path(hdfsURI+"/"))
hdfs.close()

Step3. for example: my local file path is : /tmp/hdfsData

I want to copy all files under this directory, after run Step2's code, in HDFS: all files will be on "127.0.0.1:9000/hdfsData/"

Step4. for copying from HDFS to local, just change "copyFromLocalFile" to "copyToLocalFile"

If you build your project using maven regarding to 'No FileSystem for scheme' exception I had issue like this and my case was the following:

Please check content of the JAR you're trying to run. Especially META-INFO/services directory, file org.apache.hadoop.fs.FileSystem . There should be list of filsystem implementation classes. Check line org.apache.hadoop.hdfs.DistributedFileSystem is present in the list for HDFS and org.apache.hadoop.fs.LocalFileSystem for local file scheme.

If this is the case, you have to override referred resource during the build.

Other possibility is you simply don't have hadoop-hdfs.jar in your classpath but this has low probability. Usually if you have correct hadoop-client dependency it is not an option.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM