如何在Apache Spark中删除以某些单词开头的多个hdfs目录

Question

I have persisted object files in spark streaming using dstream.saveAsObjectFiles("/temObj") method it shows multiple files in hdfs. 我使用dstream.saveAsObjectFiles("/temObj")方法将对象文件持久存储在火花流中， dstream.saveAsObjectFiles("/temObj")方法在hdfs中显示多个文件。

temObj-1506338844000
temObj-1506338848000
temObj-1506338852000
temObj-1506338856000
temObj-1506338860000

I want to delete all temObj files after reading all. 阅读全部后，我想删除所有temObj文件。 What is the bet way to do it in spark. 打赌的方式是什么？ I tried 我试过了

val hdfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("hdfs://localhost:9000"), hadoopConf)
hdfs.delete(new org.apache.hadoop.fs.Path(Path), true)

But it only can delete ane folder at a time 但一次只能删除ane文件夹

Answer 1

Unfortunately, delete doesn't support globs. 不幸的是，删除不支持glob。

You can use globStatus and iterate over the files/directories one by one and delete them. 您可以使用globStatus并globStatus遍历文件/目录并删除它们。

val hdfs = FileSystem.get(sc.hadoopConfiguration)

val deletePaths = hdfs.globStatus(new Path("/tempObj-*") ).map(_.getPath)

deletePaths.foreach{ path => hdfs.delete(path, true) }

Alternatively, you can use sys.process to execute shell commands 或者，您可以使用sys.process执行Shell命令

import scala.sys.process._

"hdfs dfs -rm -r /tempObj*" !

如何在Apache Spark中删除以某些单词开头的多个hdfs目录

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-09-26 07:30:34

如何在Apache Spark中删除以某些单词开头的多个hdfs目录

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-09-26 07:30:34

解决方案1
4 已采纳 2017-09-26 07:30:34