[英]HDFS: move multiple files using Java / Scala API
I need to move multiple files in HDFS, that correspond to a given regular expression, using a Java / Scala program. 我需要使用Java / Scala程序在HDFS中移动与给定正则表达式相对应的多个文件。 For example, I have to move all files with name *.xml
from folder a
to folder b
. 例如,我必须将所有名称为*.xml
的文件从文件夹a
移至文件夹b
。
Using a shell command I can use the following: 使用shell命令,我可以使用以下命令:
bin/hdfs dfs -mv a/*.xml b/
I can move a single file using Java API, with the following code (scala language), using the rename
method on FileSystem
class: 我可以使用Java API,使用以下代码(scala语言),并使用FileSystem
类上的rename
方法移动单个文件:
// Prepare initial configuration
val conf = new Configuration()
conf.set("fs.defaultFS", "hdfs://hdfs:9000/user/root")
val fs = FileSystem.get(conf)
// Move a single file
val ok = fs.rename(new Path("a/file.xml"), new Path("b/file.xml"));
As far as I know the Path
class represents an URI. 据我所知Path
类代表一个URI。 Then, I can't use in the following way: 然后,我不能以以下方式使用:
val ok = fs.rename(new Path("a/*.xml"), new Path("b/"));
Is there a way to move a set of file in HDFS via Java / Scala API? 是否可以通过Java / Scala API在HDFS中移动一组文件?
You can use fs.rename(new Path("a"), new Path("b"))
您可以使用fs.rename(new Path("a"), new Path("b"))
But if you want to have *.xml
there are filter files like globfilter. 但是,如果要使用*.xml
则可以使用globfilter之类的过滤器文件。
FileSystem fs = FileSystem.get(URI.create(arg0[0]), conf);
Path path = new Path(arg0[0] + arg0[1]); // arg0[1] NYSE_201[2-3]
//arg0[0] is base path
//ar0[1] uses regular expression
FileStatus[] status = fs.globStatus(path);
Path[] paths = FileUtil.stat2Paths(status);
for (Path p : paths) {
// <loops all the source paths>
// <need to implement logic to rename the paths using fs.rename>
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.