I am wanting to transfer TBs of data from S3 to an EC2 Windows Server then back again which will take a couple of hours when making use of a basic AWS CLI Copy command . To help speed things up I am wanting to make use of AWS Data Pipeline and the graphic in the AWS Data Pipeline documentation seems to suggest that data can at least flow from EC2 to S3:
Yet I am finding it hard to gain an understanding on how that can be done. The closest example I have seen is the concept of staging data by making use of the ShellCommandActivity which transfers data from an S3 DataNode to an EC2 before copying it back to S3.
Instead I am wanting to copy data from S3 on an already running Windows instance and then at a later point copy further data back into S3.
This can be done without data pipeline I believe - mainly to remove some complexity. This job can be done with the AWS CLI, which is also available on Windows XP and greater. If you dont have AWS CLI on the machine look for the MSI installer.
On *nix:
aws s3 cp --recursive s3://somebucket ./
Copies s3 bucket contents to executing directory.
In short, I don't think you would be able to get any performance benefit using AWS DataPipeline for this use case.
Reason is that Task Runner (executor used by Data Pipeline) is not supported on Windows platform . So any activities you try to run, would actually run on a different platform and then you would scp/sftp it to your machine.
There are different ways to pull data into EC2 instance on other platforms:
Do you need all of this data present on the Windows machine? Or are you going to be accessing it intermittently?
You might try just mounting your S3 bucket.
It will still be remote but will act like a normal mounted drive in Windows. If you need to do some data crunching then copy just the files you need at that moment to the local disk. You can mount S3 with S3browser,Cloudberry, or a hundred other S3 clients.
At last I was able to do in the data transfer from ec2 to s3, using datapipeline.
Steps:
- firstly we need to have a task runner running into ec2 machine.
- aws s3 wont work with your ec2 cluster , because the ec2 doesn't have the rights for your s3 buckets; use aws configure to add in access keys and secret code.
create a data pipeline, with architect adding in shellcommand activity. Use script path for your sh which could have command like aws s3 cp /var/tmp/xyz s3://abc; and most important add in the worker group which denotes your task runner which you have started on ec2.
ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-how-task-runner-user-managed.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.