简体   繁体   English

如何使用 AWS s3 或 AWS s3api 递归更改文件夹的权限

[英]How to change permission recursively to folder with AWS s3 or AWS s3api

I am trying to grant permissions to an existing account in s3.我正在尝试向 s3 中的现有帐户授予权限。

The bucket is owned by the account, but the data was copied from another account's bucket.该存储桶归该账户所有,但数据是从另一个账户的存储桶复制的。

When I try to grant permissions with the command:当我尝试使用以下命令授予权限时:

aws s3api put-object-acl --bucket <bucket_name> --key <folder_name> --profile <original_account_profile> --grant-full-control emailaddress=<destination_account_email>

I receive the error:我收到错误:

An error occurred (NoSuchKey) when calling the PutObjectAcl operation: The specified key does not exist.

while if I do it on a single file the command is successful.而如果我在单个文件上执行该命令,则该命令会成功。

How can I make it work for a full folder?我怎样才能使它适用于一个完整的文件夹?

This can be only be achieved with using pipes.这只能通过使用管道来实现。 Try -试试——

aws s3 ls s3://bucket/path/ --recursive | awk '{cmd="aws s3api put-object-acl --acl bucket-owner-full-control --bucket bucket --key "$4; system(cmd)}'

You will need to run the command individually for every object.您需要为每个对象单独运行该命令。

You might be able to short-cut the process by using:您可以使用以下方法缩短该过程:

aws s3 cp --acl bucket-owner-full-control --metadata Key=Value --profile <original_account_profile> s3://bucket/path s3://bucket/path

That is, you copy the files to themselves, but with the added ACL that grants permissions to the bucket owner.也就是说,您将文件复制到自身,但添加了向存储桶所有者授予权限的 ACL。

If you have sub-directories, then add --recursive .如果您有子目录,则添加--recursive

The other answers are ok, but the FASTEST way to do this is to use the aws s3 cp command with the option --metadata-directive REPLACE , like this:其他答案没问题,但最快的方法是使用带有选项--metadata-directive REPLACEaws s3 cp命令,如下所示:

aws s3 cp --recursive --acl bucket-owner-full-control s3://bucket/folder s3://bucket/folder --metadata-directive REPLACE

This gives speeds of between 50Mib/s and 80Mib/s.这给出了介于 50Mib/s 和 80Mib/s 之间的速度。

The answer from the comments from John R, which suggested to use a 'dummy' option, like --storage-class STANDARD . John R 的评论中的答案,建议使用“虚拟”选项,例如--storage-class STANDARD Whilst this works, only gave me copy speeds between 5Mib/s and 11mb/s.虽然这有效,但只给了我 5Mib/s 和 11mb/s 之间的复制速度。

The inspiration for trying this came from AWS's support article on the subject: https://aws.amazon.com/premiumsupport/knowledge-center/s3-object-change-anonymous-ownership/尝试此操作的灵感来自 AWS 关于该主题的支持文章: https : //aws.amazon.com/premiumsupport/knowledge-center/s3-object-change-anonymous-ownership/

NOTE: If you encounter 'access denied` for some of your objects, this is likely because you are using AWS creds for the bucket owning account, whereas you need to use creds for the account where the files were copied from.注意:如果您的某些对象遇到“访问被拒绝”,这可能是因为您将 AWS 凭证用于存储桶拥有账户,而您需要对从中复制文件的账户使用凭证。

use python to set up the permissions recursively使用python递归设置权限

#!/usr/bin/env python
import boto3
import sys

client = boto3.client('s3')
BUCKET='enter-bucket-name'

def process_s3_objects(prefix):
    """Get a list of all keys in an S3 bucket."""
    kwargs = {'Bucket': BUCKET, 'Prefix': prefix}
    failures = []
    while_true = True
    while while_true:
      resp = client.list_objects_v2(**kwargs)
      for obj in resp['Contents']:
        try:
            print(obj['Key'])
            set_acl(obj['Key'])
            kwargs['ContinuationToken'] = resp['NextContinuationToken']
        except KeyError:
            while_true = False
        except Exception:
            failures.append(obj["Key"])
            continue

    print "failures :", failures

def set_acl(key):
  client.put_object_acl(     
    GrantFullControl="id=%s" % get_account_canonical_id,
    Bucket=BUCKET,
    Key=key
)

def get_account_canonical_id():
  return client.list_buckets()["Owner"]["ID"]


process_s3_objects(sys.argv[1])

这是我的powershell唯一解决方案。

aws s3 ls s3://BUCKET/ --recursive | %{ "aws s3api put-object-acl --bucket BUCKET --key "+$_.ToString().substring(30)+" --acl bucket-owner-full-control" }

I used this Linux Bash shell oneliner to change ACLs recursively:我使用这个 Linux Bash shell oneliner 递归地更改 ACL:

aws s3 ls s3://bucket --recursive | cut -c 32- | xargs -n 1 -d '\n' -- aws s3api put-object-acl --acl public-read --bucket bukcet --key

It works even if file names contain () characters.即使文件名包含 () 字符,它也能工作。

I had a similar issue with taking ownership of log objects in a quite large bucket.我在一个相当大的存储桶中获取日志对象的所有权时遇到了类似的问题。 Total number of objects - 3,290,956 Total size 1.4 TB.对象总数 - 3,290,956 总大小 1.4 TB。

The solutions I was able to find were far too sluggish for that amount of objects.我能够找到的解决方案对于这么多对象来说太慢了。 I ended up writing some code that was able to do the job several times faster than我最终编写了一些代码,这些代码能够比

aws s3 cp aws s3 cp

You will need to install requirements:您将需要安装要求:

pip install pathos boto3 click

#!/usr/bin/env python3
import logging
import os
import sys
import boto3
import botocore
import click
from time import time
from botocore.config import Config
from pathos.pools import ThreadPool as Pool

logger = logging.getLogger(__name__)

streamformater = logging.Formatter("[*] %(levelname)s: %(asctime)s: %(message)s")
logstreamhandler = logging.StreamHandler()
logstreamhandler.setFormatter(streamformater)


def _set_log_level(ctx, param, value):
    if value:
        ctx.ensure_object(dict)
        ctx.obj["log_level"] = value
        logger.setLevel(value)
        if value <= 20:
            logger.info(f"Logger set to {logging.getLevelName(logger.getEffectiveLevel())}")
    return value


@click.group(chain=False)
@click.version_option(version='0.1.0')
@click.pass_context
def cli(ctx):
    """
        Take object ownership of S3 bucket objects.
    """
    ctx.ensure_object(dict)
    ctx.obj["aws_config"] = Config(
        retries={
            'max_attempts': 10,
            'mode': 'standard'
        }
    )


@cli.command("own")
@click.argument("bucket", type=click.STRING)
@click.argument("prefix", type=click.STRING, default="/")
@click.option("--profile", type=click.STRING, default="default", envvar="AWS_DEFAULT_PROFILE", help="Configuration profile from ~/.aws/{credentials,config}")
@click.option("--region", type=click.STRING, default="us-east-1", envvar="AWS_DEFAULT_REGION", help="AWS region")
@click.option("--threads", "-t", type=click.INT, default=40, help="Threads to use")
@click.option("--loglevel", "log_level", hidden=True, flag_value=logging.INFO, callback=_set_log_level, expose_value=False, is_eager=True, default=True)
@click.option("--verbose", "-v", "log_level", flag_value=logging.DEBUG, callback=_set_log_level, expose_value=False, is_eager=True, help="Increase log_level")
@click.pass_context
def command_own(ctx, *args, **kwargs):
    ctx.obj.update(kwargs)
    profile_name = ctx.obj.get("profile")
    region = ctx.obj.get("region")
    bucket = ctx.obj.get("bucket")
    prefix = ctx.obj.get("prefix").lstrip("/")
    threads = ctx.obj.get("threads")
    pool = Pool(nodes=threads)
    logger.addHandler(logstreamhandler)
    logger.info(f"Getting ownership of all objects in s3://{bucket}/{prefix}")
    start = time()

    try:
        SESSION: boto3.Session = boto3.session.Session(profile_name=profile_name)
    except botocore.exceptions.ProfileNotFound as e:
        logger.warning(f"Profile {profile_name} was not found.")
        logger.warning(f"Falling back to environment variables for AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN")
        AWS_ACCESS_KEY_ID = os.environ.get("AWS_ACCESS_KEY_ID", "")
        AWS_SECRET_ACCESS_KEY = os.environ.get("AWS_SECRET_ACCESS_KEY", "")
        AWS_SESSION_TOKEN = os.environ.get("AWS_SESSION_TOKEN", "")
        if AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY:
            if AWS_SESSION_TOKEN:
                SESSION: boto3.Session = boto3.session.Session(aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
                                                               aws_session_token=AWS_SESSION_TOKEN)
            else:
                SESSION: boto3.Session = boto3.session.Session(aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
        else:
            logger.error("Unable to find AWS credentials.")
            sys.exit(1)

    s3c = SESSION.client('s3', config=ctx.obj["aws_config"])

    def bucket_keys(Bucket, Prefix='', StartAfter='', Delimiter='/'):
        Prefix = Prefix[1:] if Prefix.startswith(Delimiter) else Prefix
        if not StartAfter:
            del StartAfter
            if Prefix.endswith(Delimiter):
                StartAfter = Prefix
        del Delimiter
        for page in s3c.get_paginator('list_objects_v2').paginate(Bucket=Bucket, Prefix=Prefix):
            for content in page.get('Contents', ()):
                yield content['Key']

    def worker(key):
        logger.info(f"Processing: {key}")
        s3c.copy_object(Bucket=bucket, Key=key,
                        CopySource={'Bucket': bucket, 'Key': key},
                        ACL='bucket-owner-full-control',
                        StorageClass="STANDARD"
                        )

    object_keys = bucket_keys(bucket, prefix)
    pool.map(worker, object_keys)
    end = time()
    logger.info(f"Completed for {end - start:.2f} seconds.")


if __name__ == '__main__':
    cli()

Usage:用法:

get_object_ownership.py own -v my-big-aws-logs-bucket /prefix

The bucket mentioned above was processed for ~7 hours using 40 threads.上面提到的存储桶使用 40 个线程处理了大约 7 个小时。

[*] INFO: 2021-08-05 19:53:55,542: Completed for 25320.45 seconds. [*] 信息:2021-08-05 19:53:55,542:已完成 25320.45 秒。

Some more speed comparison using AWS cli vs this tool on the same subset of data:在同一数据子集上使用 AWS cli 与此工具进行更多速度比较:

aws s3 cp --recursive --acl bucket-owner-full-control --metadata-directive 53.59s user 7.24s system 20% cpu 5:02.42 total aws s3 cp --recursive --acl bucket-owner-full-control --metadata-directive 53.59s 用户 7.24s 系统 20% cpu 5:02.42 总计

vs对比

[*] INFO: 2021-08-06 09:07:43,506: Completed for 49.09 seconds. [*] 信息:2021-08-06 09:07:43,506:完成 49.09 秒。

The python code is more efficient this way, otherwise it takes a lot longer.这样python代码效率更高,否则需要更长的时间。

import boto3
import sys

client = boto3.client('s3')
BUCKET='mybucket'

def process_s3_objects(prefix):
    """Get a list of all keys in an S3 bucket."""
    kwargs = {'Bucket': BUCKET, 'Prefix': prefix}
    failures = []
    while_true = True
    while while_true:
      resp = client.list_objects_v2(**kwargs)
      for obj in resp['Contents']:
        try:
            set_acl(obj['Key'])
        except KeyError:
            while_true = False
        except Exception:
            failures.append(obj["Key"])
            continue
      kwargs['ContinuationToken'] = resp['NextContinuationToken']
    print ("failures :"+ failures)

def set_acl(key):
  print(key)
  client.put_object_acl(
    ACL='bucket-owner-full-control',
    Bucket=BUCKET,
    Key=key
)

def get_account_canonical_id():
  return client.list_buckets()["Owner"]["ID"]


process_s3_objects(sys.argv[1])

THE MAIN COMMAND IS THIS.主要命令是这个。

WHERE bucketname_example_3636 IS YOUR BUCKET NAME.其中bucketname_example_3636 是您的存储桶名称。

aws s3api put-object-acl --bucket bucketname_example_3636 --key bigdirectories2_noglacier/bkpy_workspace_sda4.tar --acl bucket-owner-full-control aws s3api put-object-acl --bucket bucketname_example_3636 --key bigdirectories2_noglacier/bkpy_workspace_sda4.tar --acl bucket-owner-full-control

MY IDEA IS TO CREATE A SCRIPT WITH SED, EASLY.我的想法是用 SED 轻松创建一个脚本。

1. GET THE LIST OF THE KEYS; 1. 获取钥匙列表;

aws s3 ls s3://bucketname_example_3636 --recursive > listoffile.txt aws s3 ls s3://bucketname_example_3636 --recursive > listfile.txt

2. SAY YOU HAVE 1000 FILES, SO 1000 KEYS; 2. 假设你有 1000 个文件,所以有 1000 个密钥;

WITH SED CREATE AUTOMATICALLY 1000 COMMANDS;使用 SED 自动创建 1000 个命令;

THE STRING \\1 IS YOUR KEY;字符串 \\1 是你的钥匙;

sed 's/^(.*)$/aws s3api put-object-acl --bucket bucketname_example_3636 --key \\1 --acl bucket-owner-full-control/g' listoffile.txt > listoffile_v2.txt; sed 's/^(.*)$/aws s3api put-object-acl --bucket bucketname_example_3636 --key \\1 --acl bucket-owner-full-control/g' listfile.txt > listfile_v2.txt;

3. ADD THE SHEBANG LINE NECESSARY TO CONVERT A TEXTUAL FILE TO BASH SCRIPT; 3. 添加将文本文件转换为 BASH 脚本所需的 SHEBANG 行;

sed '1 i\\#!/bin/bash' listoffile_v2.txt > listoffile_v3.txt; sed '1 i\\#!/bin/bash' listfile_v2.txt > listfile_v3.txt;

4. NOW JUST CHANGE THE FILE EXTENTION; 4. 现在只需更改文件扩展名;

cp listoffile_v3.txt listoffile_v3.sh; cp listfile_v3.txt listfile_v3.sh;

NOW YOU HAVE A SCRIPT;现在你有了一个脚本;

MAKE THE SCRIPT EXECUTABLE;使脚本可执行;

chmod u+x listoffile_v3.sh; chmod u+x listfile_v3.sh;

RUN THE SCRIPT运行脚本

listoffile_v3.sh; listfile_v3.sh;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM