[英]Hadoop: How to move HDFS files in one directory to another directory?
I have an HDFS soure directory, and a destination archive directory in HDFS. 我在HDFS中有一个HDFS源目录和一个目标存档目录。 At the beginning of every run of my job, I need to move (or copy, then delete) all the part files present in my Source directory to my Archive directory.
在每次运行作业的开始,我需要将Source目录中存在的所有零件文件移动(或复制,然后删除)到我的Archive目录中。
SparkSession spark = SparkSession.builder().getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
String hdfsSrcDir = "hdfs://clusterName/my/source";
String archiveDir = "hdfs://clusterName/my/archive";
try{
FileSystem fs = FileSystem.get(new URI(hdfsSrcDir ),jsc.hadoopConfiguration());
}
I don't know how to proceed further. 我不知道该如何进一步。 Presently my
fs
object has reference to only my source directory. 目前,我的
fs
对象仅引用我的源目录。
Creating an fs2
with archive location won't help I believe. 我相信,创建具有存档位置的
fs2
不会有所帮助。
I have found out about FileSystem.rename()
, but that takes filenames as parameters. 我发现了有关
FileSystem.rename()
,但这需要使用文件名作为参数。 I need to move /my/source/*
to /my/archive/
. 我需要将
/my/source/*
移至/my/archive/
。
Check if this will works for you, 检查这是否适合您,
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://xyz:1234");
FileSystem filesystem = FileSystem.get(configuration);
FileUtil.copy(filesystem, new Path("src/path"),
filesystem, new Path("dst/path"), false, configuration);
filesystem.delete(new Path("src/path"), true);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.