简体   繁体   English

HDFS:使用Java / Scala API移动多个文件

[英]HDFS: move multiple files using Java / Scala API

I need to move multiple files in HDFS, that correspond to a given regular expression, using a Java / Scala program. 我需要使用Java / Scala程序在HDFS中移动与给定正则表达式相对应的多个文件。 For example, I have to move all files with name *.xml from folder a to folder b . 例如,我必须将所有名称为*.xml的文件从文件夹a移至文件夹b

Using a shell command I can use the following: 使用shell命令,我可以使用以下命令:

bin/hdfs dfs -mv a/*.xml b/

I can move a single file using Java API, with the following code (scala language), using the rename method on FileSystem class: 我可以使用Java API,使用以下代码(scala语言),并使用FileSystem类上的rename方法移动单个文件:

// Prepare initial configuration
val conf = new Configuration()
conf.set("fs.defaultFS", "hdfs://hdfs:9000/user/root")
val fs = FileSystem.get(conf)
// Move a single file
val ok = fs.rename(new Path("a/file.xml"), new Path("b/file.xml"));

As far as I know the Path class represents an URI. 据我所知Path类代表一个URI。 Then, I can't use in the following way: 然后,我不能以以下方式使用:

val ok = fs.rename(new Path("a/*.xml"), new Path("b/"));

Is there a way to move a set of file in HDFS via Java / Scala API? 是否可以通过Java / Scala API在HDFS中移动一组文件?

You can use fs.rename(new Path("a"), new Path("b")) 您可以使用fs.rename(new Path("a"), new Path("b"))

But if you want to have *.xml there are filter files like globfilter. 但是,如果要使用*.xml则可以使用globfilter之类的过滤器文件。

FileSystem fs = FileSystem.get(URI.create(arg0[0]), conf);
Path path = new Path(arg0[0] + arg0[1]); // arg0[1] NYSE_201[2-3]
//arg0[0] is base path
//ar0[1] uses regular expression

FileStatus[] status = fs.globStatus(path);
Path[] paths = FileUtil.stat2Paths(status);
for (Path p : paths) {
    // <loops all the source paths>
    // <need to implement logic to rename the paths using fs.rename>
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM