读取和写入Amazon s3存储桶中的文件

Question

I need to read a large (>15mb) file (say sample.csv ) from an Amazon S3 bucket. 我需要从Amazon S3存储桶读取一个大文件（> 15mb）（例如sample.csv ）。 I then need to process the data present in sample.csv and keep writing it to another directory in the S3 bucket. 然后，我需要处理sample.csv存在的数据，并将其继续写入S3存储桶中的另一个目录。 I intend to use an AWS Lambda function to run my java code. 我打算使用AWS Lambda函数运行我的Java代码。

As a first step I developed java code that runs on my local system. 第一步，我开发了可以在本地系统上运行的Java代码。 The java code reads the sample.csv file from the S3 bucket and I used the put method to write data back to the S3 bucket. Java代码从S3存储桶中读取sample.csv文件，我使用put方法将数据写回到S3存储桶中。 But I find only the last line was processed and put back. 但是我发现只有最后一行被处理并放回去了。

Region clientRegion = Region.Myregion;    
AwsBasicCredentials awsCreds = AwsBasicCredentials.create("myAccessId","mySecretKey");    
S3Client s3Client = S3Client.builder().region(clientRegion).credentialsProvider(StaticCredentialsProvider.create(awsCreds)).build();    
ResponseInputStream<GetObjectResponse> s3objectResponse = s3Client.getObject(GetObjectRequest.builder().bucket(bucketName).key("Input/sample.csv").build());    
BufferedReader reader = new BufferedReader(new InputStreamReader(s3objectResponse));    
String line = null;
while ((line = reader.readLine()) != null) {
                s3Client.putObject(PutObjectRequest.builder().bucket(bucketName).key("Test/Testout.csv").build(),RequestBody.fromString(line));
}

Example: sample.csv contains 示例：sample.csv包含

1,sam,21,java,beginner;
2,tom,28,python,practitioner;
3,john,35,c#,expert.

My output should be 我的输出应该是

1,mas,XX,java,beginner;
2,mot,XX,python,practitioner;
3,nhoj,XX,c#,expert.

But only 3,nhoj,XX,c#,expert is written in the Testout.csv . 但是在Testout.csv仅写入了3,nhoj,XX,c#,expert 。

Answer 1

The putObject() method creates an Amazon S3 object. putObject()方法创建一个Amazon S3对象。

It is not possible to append or modify an S3 object, so each time the while loop executes, it is creating a new Amazon S3 object. 无法附加或修改S3对象，因此每次执行while循环时，它都会创建一个新的Amazon S3对象。

Instead, I would recommend: 相反，我建议：

Download the source file from Amazon S3 to local disk (use GetObject() with a destinationFile to download to disk) 将源文件从Amazon S3 下载到本地磁盘（将GetObject()与destinationFile一起使用以下载到磁盘）
Process the file and output to a local file 处理文件并输出到本地文件
Upload the output file to the Amazon S3 bucket ( method ) 将输出文件上传到Amazon S3存储桶（方法）

This separates the AWS code from your processing code, which should be easier to maintain. 这会将AWS代码与您的处理代码分开，这应该更易于维护。

读取和写入Amazon s3存储桶中的文件

问题描述

1 个解决方案

解决方案1
3 2019-06-12 07:21:28

读取和写入Amazon s3存储桶中的文件

问题描述

1 个解决方案

解决方案1 3 2019-06-12 07:21:28

解决方案1
3 2019-06-12 07:21:28