I need to update a mutable list with the content of a directory in HDFS, I have the following code witch in spark-shell works but inside an script it doesn't:
import org.apache.hadoop.fs._
import org.apache.spark.deploy.SparkHadoopUtil
var listOfFiles= scala.collection.mutable.ListBuffer[String]()
val hdfs_conf = SparkHadoopUtil.get.newConfiguration(sc.getConf)
val hdfs = FileSystem.get(hdfs_conf)
val sourcePath = new Path(filePath)
hdfs.globStatus( sourcePath ).foreach{ fileStatus =>
val filePathName = fileStatus.getPath().toString();
val fileName = fileStatus.getPath().getName();
listOfFiles.append(fileName)
}
listOfFiles.tail
any help, when running it launches an exception telling that listOfFiles is empty.
You should avoid using mutable collection.
Try:
val listOfFiles = hdfs.globStatus(sourcePath).map{ fileStatus =>
fileStatus.getPath().getName();
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.