i am trying to fetch and read text files in a zip file uploaded on aws s3 bucket
code i tried
var ZipFileList = spark.sparkContext.binaryFiles(/path/);
var unit = ZipFileList.flatMap {
case (zipFilePath, zipContent) =>
{
val zipInputStream = new ZipInputStream(zipContent.open())
val zipEntry = zipInputStream.getNextEntry()
println(zipEntry.getName)
}
}
but it gives an error unit required traversableOnce
val files = spark.sparkContext.wholeTextFiles(/path/))
files.flatMap({case (name, content) =>
unzip(content) //gives error "type mismatch; found : Unit required: scala.collection.GenTraversableOnce[?]"
})
is there any other way to read file contents inside a zip file ... zip file contains .json files and i want to achieve is to read and parse all those files
you aren't actually returning the data in the unzip() command, are you? I think that's part of the problem
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.