简体   繁体   中英

How to move files within the Hadoop HDFS directory?

I need to move the files from one HDFS directory to another HDFS directory.

I wanted to check if there's some easier way (some HDFS API) to achieve the same task, other than InputStream/OutputStream ?

I've heard of FileSystem.rename(srcDir, destDir); but is unsure if this will delete the original src directory.

I don't want to remove the original directory structure, only move the files from one folder to another directory.

eg

input Dir - /testHDFS/input/*.txt
dest Dir - /testHDFS/destination

After moving the files, directory should look something like this :-

input Dir - /testHDFS/input
dest Dir - /testHDFS/destination/*.txt

PS : I want to achieve this working inside mapper function for each file.

Any help would be appreciated.

FileSystem.rename will move the file from source to destination directory. I believe you can use it for your requirement.

The best way to do this is with org.apache.hadoop.fs.FileUtil.copy() , setting the deleteSource parameter to true . People commonly use FileSystem.rename() , but that function will fail silently for invisible issues ( such as the source and destination Paths being on different volumes )

您可以使用DistCp以编程方式对此进行验证

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM