[英]How to count number of directory in S3 using spark scala?
I have a problem to count the number of the folder in a S3 specific directory with Spark Scala.我在使用 Spark Scala 计算 S3 特定目录中的文件夹数量时遇到问题。
Directory structure:目录结构:
source/20220309/client_1/file_1.csv
source/20220309/client_2/file_1.csv
source/20220308/client_1/file_1.csv
source/20220308/client_2/file_1.csv
so i am looking for count 20220309 and 20220308. So the count will be 2. I do not want count for nested folder just top level folder count.所以我正在寻找计数 20220309 和 20220308。所以计数将为 2。我不希望嵌套文件夹计数只是顶级文件夹计数。
You can use the hadoop.fs
lib:您可以使用
hadoop.fs
库:
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.conf.Configuration
import java.net.URI
val path = "s3a://bucket/prefix"
val fs = FileSystem.get(URI.create(path), new Configuration())
val dirs = fs.listStatus(new Path(path)).count(_.isDirectory)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.