[英]How to delete multiple hdfs directories starting with some word in Apache Spark
I have persisted object files in spark streaming using dstream.saveAsObjectFiles("/temObj")
method it shows multiple files in hdfs. 我使用
dstream.saveAsObjectFiles("/temObj")
方法将对象文件持久存储在火花流中, dstream.saveAsObjectFiles("/temObj")
方法在hdfs中显示多个文件。
temObj-1506338844000
temObj-1506338848000
temObj-1506338852000
temObj-1506338856000
temObj-1506338860000
I want to delete all temObj files after reading all. 阅读全部后,我想删除所有temObj文件。 What is the bet way to do it in spark.
打赌的方式是什么? I tried
我试过了
val hdfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("hdfs://localhost:9000"), hadoopConf)
hdfs.delete(new org.apache.hadoop.fs.Path(Path), true)
But it only can delete ane folder at a time 但一次只能删除ane文件夹
Unfortunately, delete doesn't support globs. 不幸的是,删除不支持glob。
You can use globStatus
and iterate over the files/directories one by one and delete them. 您可以使用
globStatus
并globStatus
遍历文件/目录并删除它们。
val hdfs = FileSystem.get(sc.hadoopConfiguration)
val deletePaths = hdfs.globStatus(new Path("/tempObj-*") ).map(_.getPath)
deletePaths.foreach{ path => hdfs.delete(path, true) }
Alternatively, you can use sys.process
to execute shell commands 或者,您可以使用
sys.process
执行Shell命令
import scala.sys.process._
"hdfs dfs -rm -r /tempObj*" !
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.