如何从 S3 下载 GZip 文件？

Question

我查看了AWS S3 Java SDK - Download file help和Working with Zip and GZip files in Java 。

虽然它们分别提供了从 S3 和 GZipped 文件下载和处理文件的方法，但这些方法对处理位于 S3 中的 GZipped 文件没有帮助。 我该怎么做？

目前我有：

try {
    AmazonS3 s3Client = new AmazonS3Client(
            new ProfileCredentialsProvider());
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));
    BufferedReader fileIn = new BufferedReader(new InputStreamReader(
            fileObj.getObjectContent()));
    String fileContent = "";
    String line = fileIn.readLine();
    while (line != null){
        fileContent += line + "\n";
        line = fileIn.readLine();
    }
    fileObj.close();
    return fileContent;
} catch (IOException e) {
    e.printStackTrace();
    return "ERROR IOEXCEPTION";
}

显然，我没有处理文件的压缩性质，我的输出是：

����sU�3204�50�5010�20�24��L,(���O�V�M-.NLOU�R�U�����<s��<#�^�.wߐX�%w���������}C=�%�J3��.�����둚�S�ᜑ���ZQ�T�e��#sr�cdN#瘐:&�
S�BǔJ����P�<��

但是，我无法实现上面给出的第二个问题中的示例，因为该文件不在本地，需要从 S3 下载。

我该怎么办？

Answer 1

我使用Scanner而不是InputStream解决了这个问题。

扫描器获取 GZIPInputStream 并逐行读取解压文件：

fileObj = s3Client.getObject(new GetObjectRequest(oSummary.getBucketName(), oSummary.getKey()));
fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));

Answer 2

你必须使用GZIPInputStream来读取 GZIP 文件

       AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
            .withCredentials(new ProfileCredentialsProvider())
            .build();
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));

    byte[] buffer = new byte[1024];
    int n;
    FileOutputStream fileOuputStream = new FileOutputStream("temp.gz");
    BufferedInputStream bufferedInputStream = new BufferedInputStream( new GZIPInputStream(fileObj.getObjectContent()));

    GZIPOutputStream gzipOutputStream = new GZIPOutputStream(fileOuputStream);
    while ((n = bufferedInputStream.read(buffer)) != -1) {
        gzipOutputStream.write(buffer);
    }
    gzipOutputStream.flush();
    gzipOutputStream.close();

请尝试这种方式从 S3 下载 GZip 文件。

Answer 3

尝试这个

    BasicAWSCredentials creds = new BasicAWSCredentials("accessKey", "secretKey");
    AmazonS3 s3 = AmazonS3ClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(creds))
            .withRegion(Regions).build();
    String bucketName = "bucketName";
    String keyName = "keyName";
    S3Object fileObj = s3.getObject(new GetObjectRequest(bucketName, keyName));
    Scanner fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));
    if (null != fileIn) {
        while (fileIn.hasNext()) {
            System.out.println("Line: " + fileIn.nextLine());
        }
    }
}

Answer 4

我正在使用 SDK 2.x 使用相同的方法实现相同的目标。 随着 SDK 2 中引入的新理念，在得出解决方案之前，我不得不做一些研究。 因此，为了使用 SDK 2.0 的人的利益，在这里添加一个代码片段。

    S3Client s3 = S3Client.builder()
            .region(region)
            .build();

    //Using the key, get the object
    GetObjectRequest request = GetObjectRequest.builder().bucket(bucketName).key(key).build();
    //Read the object as input stream
    InputStream inputStream = s3.getObject(request, ResponseTransformer.toBytes()).asInputStream();
    final GZIPInputStream zipInputStream;
    try {
        //Convert it to GZIP stream
        zipInputStream = new GZIPInputStream(inputStream);;
        BufferedReader in = new BufferedReader(new InputStreamReader(zipInputStream));
        String contentStr;
        while ((contentStr = in.readLine()) != null) {
            //Process the contents
            System.out.println(contentStr);
        }
    } catch (IOException e) {
        //Handle the exception
    }

Answer 5

我并不是很在意这个问题，但我确实想通过实际解释为什么已经提供的解决方案有效来提高这个线程的质量。

不，这不是因为建议的扫描仪。 这是因为该流正被包装ungzipped fileObj.getObjectContent()在GZIPInputStream其中解压缩的内容。

移除scanner但保留GZIPInputStream并且事情仍然可以工作。

如何从 S3 下载 GZip 文件？

问题描述

5 个解决方案

解决方案1
8 已采纳 2015-07-02 07:45:33

解决方案2
6 2017-09-19 11:33:22

解决方案3
1 2019-05-03 17:05:57

解决方案4
0 2021-01-21 03:05:35

解决方案5
-1 2017-01-21 18:37:58

如何从 S3 下载 GZip 文件？

问题描述

5 个解决方案

解决方案1 8 已采纳 2015-07-02 07:45:33

解决方案2 6 2017-09-19 11:33:22

解决方案3 1 2019-05-03 17:05:57

解决方案4 0 2021-01-21 03:05:35

解决方案5 -1 2017-01-21 18:37:58

解决方案1
8 已采纳 2015-07-02 07:45:33

解决方案2
6 2017-09-19 11:33:22

解决方案3
1 2019-05-03 17:05:57

解决方案4
0 2021-01-21 03:05:35

解决方案5
-1 2017-01-21 18:37:58