简体繁体中英

Which AWS service should I use to process large text file?

原文 2018-06-24 18:38:54 5 1 amazon-web-services/ amazon-s3/ aws-lambda/ amazon-kinesis

I have a use-case where I need to read a very large text file that can contain up to 1 million records. For each record, I have to perform some validation and then transform it into a different JSON and then push it to an SNS Topic. I don't need to read them sequentially hence I can use parallelism. One option is to put the file in an S3 bucket then use a lambda to process the file which fans out (asynchronously) the records to multiple lambda functions which take care of transforming(& validation) then pushing it to an SNS. The other option is to use kinesis stream and use multiple lambdas doing the same thing. Multiple Lambdas using kinesis streams

What should be the ideal way to do this?

S3 -> Lambda -> Multiple Lambdas -> SNS
Kinesis -> Multiple Lambdas (or Lambda -> Multiple Lambdas -> SNS)

1 answers

You might want to look into AWS Glue. This service can perform ETL on most of the things stored in S3, so it might save you the hassle of doing that by yourself. Combined of S3 triggering Lambda this might be an interesting option?

Edit: If the file can be parsed with RegExs, perhaps try Athena? Athena is relatively cheap and can handle larger files without a hitch.

If the records have predictable length you could use Range requests get divide the file before you pass it onto Lambda, preventing long run times.

Also, have you tried parsing and chunking the file with Lambda? 1 million records isn't THAT much and simply line splitting and handing (chunks) off to a validation (or perhaps SNS) shouldn't be an issue.

Which service should I use to run scheduled task on AWS?

What AWS service can I use to efficiently process large amounts of S3 data on a weekly basis?

Which aws database should I use?

In which AWS service should I deploy my react app

Which functions should I use to read aws lambda log

Which aws messaging service to use

How to use AWS service for Mongodb and which service?

Should I process job in batch on aws lambda?

How should I post a file to AWS Lambda function, process it, and return a file to the client?

Which AWS service to use to deploy Backend?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Which service should I use to run scheduled task on AWS? What AWS service can I use to efficiently process large amounts of S3 data on a weekly basis? Which aws database should I use? In which AWS service should I deploy my react app Which functions should I use to read aws lambda log Which aws messaging service to use How to use AWS service for Mongodb and which service? Should I process job in batch on aws lambda? How should I post a file to AWS Lambda function, process it, and return a file to the client? Which AWS service to use to deploy Backend?

Related Tags

Which AWS service should I use to process large text file?

Question

1 answers

solution1 2 ACCPTED 2018-06-24 18:48:49

solution1
2 ACCPTED 2018-06-24 18:48:49