简体   繁体   English

EC2 上 S3 区域之间的 aws cli 复制

[英]aws cli copy between S3 regions on EC2

I am trying to copy between two S3 buckets in different regions using the Command Line Interface on an EC2 server.我正在尝试使用 EC2 服务器上的命令行界面在不同区域的两个 S3 存储桶之间进行复制。

region info:地区信息:
EC2 instance: us-west-2 EC2 实例:us-west-2
S3 origin: us-east-1 S3 来源:us-east-1
S3 destination: us-west-2 S3 目的地:us-west-2

The following commands work perfectly from the EC2 server:以下命令可在 EC2 服务器上完美运行:
aws s3 cp s3://n-virginia/origin s3://n-virginia/destination --recursive --source-region us-east-1 --region us-east-1 --profile my_profile

aws s3 cp s3://oregon/origin s3://oregon/destination --recursive --source-region us-west-2 --region us-west-2 --profile my_profile

I need to run the following command from the EC2 server:我需要从 EC2 服务器运行以下命令:
aws s3 cp s3://n-virginia/origin s3://oregon/destination --recursive --source-region us-east-1 --region us-west-2 --profile my_profile

If I run that command from a local machine it works, but if I run it from the EC2 server that I used for the previous two commands I get the following error:如果我从本地计算机运行该命令,它可以工作,但如果我从用于前两个命令的 EC2 服务器运行它,我会收到以下错误:

Error: "A client error (AccessDenied) occurred when calling the CopyObject operation: VPC endpoints do not support cross-region requests"错误: "A client error (AccessDenied) occurred when calling the CopyObject operation: VPC endpoints do not support cross-region requests"

I am able to copy the files from the origin bucket to the EC2 server, and then copy from the EC2 server to the destination bucket, but this is not an acceptable solution in production.我能够将文件从源存储桶复制到 EC2 服务器,然后从 EC2 服务器复制到目标存储桶,但这不是生产中可接受的解决方案。 I don't understand why it will work on a local machine but not on the EC2 server ("my_profile" is identical on both machines)我不明白为什么它可以在本地机器上工作,但不能在 EC2 服务器上工作(“my_profile”在两台机器上都是相同的)

As pointed out in the comments the problem is your VPC has an endpoint and cross region copies are not supported .正如评论中指出的,问题是您的 VPC 有一个端点,并且不支持跨区域副本

To fix that, either temporarily disable the VPC endpoint, by updating your VPC route table, or just create a new VPC without a VPC endpoint and launch an EC2 there.要解决此问题,请通过更新您的 VPC 路由表暂时禁用 VPC 终端节点,或者只是创建一个没有 VPC 终端节点的新 VPC 并在那里启动 EC2。

Cross region replication would be ideal, but as pointed out, that only effects new items in the bucket跨区域复制将是理想的,但正如所指出的,这只会影响存储桶中的新项目

Instead of using aws s3 cp you probably want to use aws s3 sync .您可能想使用aws s3 sync而不是使用aws s3 cp Sync will only copy changed files, thus allowing you to rerun it again in case it is interrupted.同步只会复制更改过的文件,因此您可以在它被中断的情况下再次重新运行它。 For example:例如:

aws s3 sync s3://n-virginia/origin s3://oregon/destination

Note also that both cp and sync do NOT preserve ACL.还要注意的是两个cpsync保留ACL。 So if you have changed ACL permission on individual files they will all be set to the default after the copy.因此,如果您更改了单个文件的 ACL 权限,它们将在复制后全部设置为默认值。 There are some other tools that are supposed to preserve ACL the like https://s3tools.org which seems to work for me.还有一些其他工具应该可以保留 ACL,例如https://s3tools.org ,这似乎对我有用。

If downloading the entire bucket locally is not feasible due to disk space required, you can download, upload and remove 5 seconds worth of files.如果由于需要磁盘空间而无法在本地下载整个存储桶,您可以下载、上传和删除 5 秒的文件。

The first line of shell snippet below starts a background download of the entire source bucket to the local disk.下面的 shell 片段的第一行开始将整个源存储桶后台下载到本地磁盘。 While there are files in the current directory, call aws s3 mv which will copy files to the destination bucket and remove them locally.当当前目录中有文件时,调用aws s3 mv将文件复制到目标存储桶并在本地删除它们。

mkdir tempdir
aws s3 sync s3://source-bucket . &
sleep 5
while [ $(ls | wc -l) -gt 0 ] ; do mv *.txt tempdir ; aws s3 mv --recursive tempdir/* s3://destination-bucket ; done

The aws s3 sync command creates temporary files, with random extension, while writing files to disk. aws s3 sync 命令在将文件写入磁盘时创建具有随机扩展名的临时文件。 The aws s3 mv command will unfortunately sometimes upload these files.不幸的是,aws s3 mv 命令有时会上传这些文件。 To avoid this, move a batch of the files, eg all .txt files, to a temporary directory and upload only them.为避免这种情况,请将一批文件(例如所有 .txt 文件)移动到临时目录并仅上传它们。

In practice I see no more than 50M of disk used locally ( less than 500 files where each file is less than 100k)在实践中,我看到本地使用的磁盘不超过 50M(少于 500 个文件,其中每个文件小于 100k)

I know this is an old post but we have faced the same issues recently.我知道这是一个旧帖子,但我们最近遇到了同样的问题。

To update the @astrotom response, Amazon S3 Cross-Region Replication (CRR) now supports copying existing objects.为了更新 @astrotom 响应,Amazon S3 跨区域复制 (CRR) 现在支持复制现有对象。 you just need to ask for aws support team to unlock the feature.您只需要请求 aws 支持团队来解锁该功能。 full explanation here and here 此处此处的完整说明

From our side, we preferred @brendan solution even though it saturates the network.就我们而言,我们更喜欢@brendan 解决方案,即使它会使网络饱和。 you can find here a Kubernetes job that can help you automate it.你可以在这里找到一个可以帮助你自动化它的 Kubernetes 工作。

you can find in this blog multiple approaches to migrate our buckets cross region cross account您可以在此博客中找到多种方法来跨区域跨账户迁移我们的存储桶

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM