简体   繁体   中英

reading zip file from s3 bucket using scala spark

i am trying to fetch and read text files in a zip file uploaded on aws s3 bucket

code i tried

var ZipFileList = spark.sparkContext.binaryFiles(/path/);

   var unit =  ZipFileList.flatMap {
        case (zipFilePath, zipContent) =>
          {
          val zipInputStream = new ZipInputStream(zipContent.open())

          val zipEntry = zipInputStream.getNextEntry()
          println(zipEntry.getName)
          }
      }

but it gives an error unit required traversableOnce

 val files = spark.sparkContext.wholeTextFiles(/path/))
    files.flatMap({case (name, content) =>
      unzip(content) //gives error "type mismatch; found : Unit required: scala.collection.GenTraversableOnce[?]" 
    })

is there any other way to read file contents inside a zip file ... zip file contains .json files and i want to achieve is to read and parse all those files

you aren't actually returning the data in the unzip() command, are you? I think that's part of the problem

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM