简体   繁体   中英

What is the best way to transfer data from AWS SQS to S3?

Here is the case - I have a large dataset, temporally retained in AWS SQS (around 200GB).

My main goal is to store the data so I can access it for building a machine learning model using also AWS. I believe, I should transfer the data to a S3 bucket. And while it is straightforward when you deal with small datasets, I am not sure what the best way to handle large ones is.

There is no way I can do it locally on my laptop, is it? So, do I create a ec2 instance and process the data there? Amazon has so many different solutions and ways of integration so it is kinda confusing.

Thanks for your help!

for building a machine learning model using also AWS. I believe, I should transfer the data to a S3 bucket.

Imho good idea. Indeed, S3 is the best option to retain data and be able to reuse them (unlike sqs). AWS tools (sagemaker, ml) can directly use content stored in s3. Most of the machine learning framework can read files, where you can easily copy files from s3 or mount a bucket as a filesystem (not my favourite option, but possible)

And while it is straightforward when you deal with small datasets, I am not sure what the best way to handle large ones is.

It depends on what data do you have a how you want to store and process the data files.

If you plan to have a file for each sqs message, I'd suggest to create a lambda function (assuming you can read and store the message reasonably fast).

If you want to aggregate and/or concatenate the source messages or processing a message would take too long, you may rather write a script to read and process the data on a server.

There is no way I can do it locally on my laptop, is it? So, do I create a ec2 instance and process the data there?

well - in theory you can do it on your laptop, but it would mean downloading 200G and uploading 200G (not counting the overhead and speed latency)

Your intuition is IMHO good, having EC2 in the same region would be most feasible, accessing all data almost locally

Amazon has so many different solutions and ways of integration so it is kinda confusing.

you have many options feasible for different use cases, often overlapping, so indeed it may look confusing

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM