简体   繁体   中英

Uncompress and read gz files from S3 - Scala

I have a list of gzip files in an S3 folder and have to read the files using scala. Iterate each file and store the content of the file in list of String buffer.

This is the method to read one file and return as String.

  def getDecompressedData(bucket: String, key: String) : String= {
     val getObjectRequest = new GetObjectRequest(bucket, key)
     val s3Object = s3Client.getObject(getObjectRequest)
     val byteArray = IOUtils.toByteArray(s3Object.getObjectContent)
     val inputStream = new GZIPInputStream(new ByteArrayInputStream(byteArray))
     val data = scala.io.Source.fromInputStream(inputStream).mkString
     inputStream.close()
     data
  }

I get the error

Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
    at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
    at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
    at java.io.FilterInputStream.read(FilterInputStream.java:107)
    at com.amazonaws.util.IOUtils.toByteArray(IOUtils.java:44)
    at com.amazonaws.util.IOUtils.toString(IOUtils.java:58)

at val data = scala.io.Source.fromInputStream(inputStream).mkString

def getDecompressedData(bucket: String, key: String) : String= {
     val getObjectRequest = new GetObjectRequest(bucket, key)
     val s3Object = s3Client.getObject(getObjectRequest)

     var data: String = ""

     // If S3 file is compressed
     if(gzip) {

        val gzipData = new Scanner(new GZIPInputStream(s3Object.getObjectContent)).asScala
        data = gzipData.mkstring

     } else {

        val plainText = new Scanner(new InputStreamReader(s3Object.getObjectContent)).asScala
        data = plainText.mkstring
    }

    s3Object.close()

    data
  }

I had provided the code for both gzip file and plain file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM