简体   繁体   English

如何从 S3 下载 GZip 文件?

[英]How to download GZip file from S3?

I have looked at both AWS S3 Java SDK - Download file help and Working with Zip and GZip files in Java .我查看了AWS S3 Java SDK - Download file helpWorking with Zip and GZip files in Java

While they provide ways to download and deal with files from S3 and GZipped files respectively, these do not help in dealing with a GZipped file located in S3.虽然它们分别提供了从 S3 和 GZipped 文件下载和处理文件的方法,但这些方法对处理位于 S3 中的 GZipped 文件没有帮助。 How would I do this?我该怎么做?

Currently I have:目前我有:

try {
    AmazonS3 s3Client = new AmazonS3Client(
            new ProfileCredentialsProvider());
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));
    BufferedReader fileIn = new BufferedReader(new InputStreamReader(
            fileObj.getObjectContent()));
    String fileContent = "";
    String line = fileIn.readLine();
    while (line != null){
        fileContent += line + "\n";
        line = fileIn.readLine();
    }
    fileObj.close();
    return fileContent;
} catch (IOException e) {
    e.printStackTrace();
    return "ERROR IOEXCEPTION";
}

Clearly, I am not handling the compressed nature of the file, and my output is:显然,我没有处理文件的压缩性质,我的输出是:

����sU�3204�50�5010�20�24��L,(���O�V�M-.NLOU�R�U�����<s��<#�^�.wߐX�%w���������}C=�%�J3��.�����둚�S�ᜑ���ZQ�T�e��#sr�cdN#瘐:&�
S�BǔJ����P�<��

However, I cannot implement the example in the second question given above because the file is not located locally, it requires downloading from S3.但是,我无法实现上面给出的第二个问题中的示例,因为该文件不在本地,需要从 S3 下载。

What should I do?我该怎么办?

I solved the issue using a Scanner instead of an InputStream .我使用Scanner而不是InputStream解决了这个问题。

The scanner takes the GZIPInputStream and reads the unzipped file line by line:扫描器获取 GZIPInputStream 并逐行读取解压文件:

fileObj = s3Client.getObject(new GetObjectRequest(oSummary.getBucketName(), oSummary.getKey()));
fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));

You have to use GZIPInputStream to read GZIP file你必须使用GZIPInputStream来读取 GZIP 文件

       AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
            .withCredentials(new ProfileCredentialsProvider())
            .build();
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));

    byte[] buffer = new byte[1024];
    int n;
    FileOutputStream fileOuputStream = new FileOutputStream("temp.gz");
    BufferedInputStream bufferedInputStream = new BufferedInputStream( new GZIPInputStream(fileObj.getObjectContent()));

    GZIPOutputStream gzipOutputStream = new GZIPOutputStream(fileOuputStream);
    while ((n = bufferedInputStream.read(buffer)) != -1) {
        gzipOutputStream.write(buffer);
    }
    gzipOutputStream.flush();
    gzipOutputStream.close();

Please try this way to download GZip file from S3.请尝试这种方式从 S3 下载 GZip 文件。

Try this尝试这个

    BasicAWSCredentials creds = new BasicAWSCredentials("accessKey", "secretKey");
    AmazonS3 s3 = AmazonS3ClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(creds))
            .withRegion(Regions).build();
    String bucketName = "bucketName";
    String keyName = "keyName";
    S3Object fileObj = s3.getObject(new GetObjectRequest(bucketName, keyName));
    Scanner fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));
    if (null != fileIn) {
        while (fileIn.hasNext()) {
            System.out.println("Line: " + fileIn.nextLine());
        }
    }
}

I was working on achieving the same using the same using SDK 2.x.我正在使用 SDK 2.x 使用相同的方法实现相同的目标。 With new philosophy introduced in SDK 2, I had to do a bit of research before arriving at the solution.随着 SDK 2 中引入的新理念,在得出解决方案之前,我不得不做一些研究。 So, adding a code snippet here for the benefit of people using SDK 2.0.因此,为了使用 SDK 2.0 的人的利益,在这里添加一个代码片段。

    S3Client s3 = S3Client.builder()
            .region(region)
            .build();

    //Using the key, get the object
    GetObjectRequest request = GetObjectRequest.builder().bucket(bucketName).key(key).build();
    //Read the object as input stream
    InputStream inputStream = s3.getObject(request, ResponseTransformer.toBytes()).asInputStream();
    final GZIPInputStream zipInputStream;
    try {
        //Convert it to GZIP stream
        zipInputStream = new GZIPInputStream(inputStream);;
        BufferedReader in = new BufferedReader(new InputStreamReader(zipInputStream));
        String contentStr;
        while ((contentStr = in.readLine()) != null) {
            //Process the contents
            System.out.println(contentStr);
        }
    } catch (IOException e) {
        //Handle the exception
    }

I wasn't quite looking for this issue but I did feel like improving the quality of this thread by actually explaining why the already provided solution works.我并不是很在意这个问题,但我确实想通过实际解释为什么已经提供的解决方案有效来提高这个线程的质量。

No it's not because of the Scanner as is suggested.不,这不是因为建议的扫描仪。 It's because the stream is being ungzipped by wrapping fileObj.getObjectContent() in a GZIPInputStream which unzips the contents.这是因为该流正被包装ungzipped fileObj.getObjectContent()GZIPInputStream其中解压缩的内容。

Remove the scanner but keep the GZIPInputStream and things will still work.移除scanner但保留GZIPInputStream并且事情仍然可以工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM