简体   繁体   中英

How do I download data from the internet to an S3 bucket via EC2?

I want to download several large files from the internet (specifically Reddit monthly submissions from the site PushShift) into an S3 bucket. I am SSHed into an EC2 instance and have a Jupyter notebook running.

Ideally I want to be able to write a Python script in Jupyter notebook of my EC2 instance that downloads the file from the internet and then pushes it to my S3 bucket. How would I go about doing this?

It is not possible to "download data from the Internet into Amazon S3".

Amazon S3 is an object storage service. You can upload data to S3 and download data from S3, but it is not possible to tell S3 to download data from some other location and store it .

You will need a program running somewhere that obtains the data from the Internet, then uploads it (creates an object) in Amazon S3. Such a program could be clever enough to 'stream' the data to S3 by downloading content in-memory and then sending it to S3, without having to save to disk in between, but you would need to write that code.

As to 'where' such a program might run, it would be most efficient to run such code either as an AWS Lambda function or on an Amazon EC2 instance that is in the same region as the Amazon S3 bucket.

Since you are running a Jupyter notebook on an Amazon EC2 instance, it would be easiest to download the file to local storage, then upload it to S3.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM