I need to read a large (>15mb) file (say sample.csv
) from an Amazon S3 bucket. I then need to process the data present in sample.csv
and keep writing it to another directory in the S3 bucket. I intend to use an AWS Lambda function to run my java code.
As a first step I developed java code that runs on my local system. The java code reads the sample.csv
file from the S3 bucket and I used the put
method to write data back to the S3 bucket. But I find only the last line was processed and put back.
Region clientRegion = Region.Myregion;
AwsBasicCredentials awsCreds = AwsBasicCredentials.create("myAccessId","mySecretKey");
S3Client s3Client = S3Client.builder().region(clientRegion).credentialsProvider(StaticCredentialsProvider.create(awsCreds)).build();
ResponseInputStream<GetObjectResponse> s3objectResponse = s3Client.getObject(GetObjectRequest.builder().bucket(bucketName).key("Input/sample.csv").build());
BufferedReader reader = new BufferedReader(new InputStreamReader(s3objectResponse));
String line = null;
while ((line = reader.readLine()) != null) {
s3Client.putObject(PutObjectRequest.builder().bucket(bucketName).key("Test/Testout.csv").build(),RequestBody.fromString(line));
}
Example: sample.csv contains
1,sam,21,java,beginner;
2,tom,28,python,practitioner;
3,john,35,c#,expert.
My output should be
1,mas,XX,java,beginner;
2,mot,XX,python,practitioner;
3,nhoj,XX,c#,expert.
But only 3,nhoj,XX,c#,expert
is written in the Testout.csv
.
The putObject()
method creates an Amazon S3 object.
It is not possible to append or modify an S3 object, so each time the while
loop executes, it is creating a new Amazon S3 object.
Instead, I would recommend:
GetObject()
with a destinationFile
to download to disk) This separates the AWS code from your processing code, which should be easier to maintain.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.