简体   繁体   中英

How to change permission recursively to folder with AWS s3 or AWS s3api

I am trying to grant permissions to an existing account in s3.

The bucket is owned by the account, but the data was copied from another account's bucket.

When I try to grant permissions with the command:

aws s3api put-object-acl --bucket <bucket_name> --key <folder_name> --profile <original_account_profile> --grant-full-control emailaddress=<destination_account_email>

I receive the error:

An error occurred (NoSuchKey) when calling the PutObjectAcl operation: The specified key does not exist.

while if I do it on a single file the command is successful.

How can I make it work for a full folder?

This can be only be achieved with using pipes. Try -

aws s3 ls s3://bucket/path/ --recursive | awk '{cmd="aws s3api put-object-acl --acl bucket-owner-full-control --bucket bucket --key "$4; system(cmd)}'

You will need to run the command individually for every object.

You might be able to short-cut the process by using:

aws s3 cp --acl bucket-owner-full-control --metadata Key=Value --profile <original_account_profile> s3://bucket/path s3://bucket/path

That is, you copy the files to themselves, but with the added ACL that grants permissions to the bucket owner.

If you have sub-directories, then add --recursive .

The other answers are ok, but the FASTEST way to do this is to use the aws s3 cp command with the option --metadata-directive REPLACE , like this:

aws s3 cp --recursive --acl bucket-owner-full-control s3://bucket/folder s3://bucket/folder --metadata-directive REPLACE

This gives speeds of between 50Mib/s and 80Mib/s.

The answer from the comments from John R, which suggested to use a 'dummy' option, like --storage-class STANDARD . Whilst this works, only gave me copy speeds between 5Mib/s and 11mb/s.

The inspiration for trying this came from AWS's support article on the subject: https://aws.amazon.com/premiumsupport/knowledge-center/s3-object-change-anonymous-ownership/

NOTE: If you encounter 'access denied` for some of your objects, this is likely because you are using AWS creds for the bucket owning account, whereas you need to use creds for the account where the files were copied from.

use python to set up the permissions recursively

#!/usr/bin/env python
import boto3
import sys

client = boto3.client('s3')
BUCKET='enter-bucket-name'

def process_s3_objects(prefix):
    """Get a list of all keys in an S3 bucket."""
    kwargs = {'Bucket': BUCKET, 'Prefix': prefix}
    failures = []
    while_true = True
    while while_true:
      resp = client.list_objects_v2(**kwargs)
      for obj in resp['Contents']:
        try:
            print(obj['Key'])
            set_acl(obj['Key'])
            kwargs['ContinuationToken'] = resp['NextContinuationToken']
        except KeyError:
            while_true = False
        except Exception:
            failures.append(obj["Key"])
            continue

    print "failures :", failures

def set_acl(key):
  client.put_object_acl(     
    GrantFullControl="id=%s" % get_account_canonical_id,
    Bucket=BUCKET,
    Key=key
)

def get_account_canonical_id():
  return client.list_buckets()["Owner"]["ID"]


process_s3_objects(sys.argv[1])

这是我的powershell唯一解决方案。

aws s3 ls s3://BUCKET/ --recursive | %{ "aws s3api put-object-acl --bucket BUCKET --key "+$_.ToString().substring(30)+" --acl bucket-owner-full-control" }

I used this Linux Bash shell oneliner to change ACLs recursively:

aws s3 ls s3://bucket --recursive | cut -c 32- | xargs -n 1 -d '\n' -- aws s3api put-object-acl --acl public-read --bucket bukcet --key

It works even if file names contain () characters.

I had a similar issue with taking ownership of log objects in a quite large bucket. Total number of objects - 3,290,956 Total size 1.4 TB.

The solutions I was able to find were far too sluggish for that amount of objects. I ended up writing some code that was able to do the job several times faster than

aws s3 cp

You will need to install requirements:

pip install pathos boto3 click

#!/usr/bin/env python3
import logging
import os
import sys
import boto3
import botocore
import click
from time import time
from botocore.config import Config
from pathos.pools import ThreadPool as Pool

logger = logging.getLogger(__name__)

streamformater = logging.Formatter("[*] %(levelname)s: %(asctime)s: %(message)s")
logstreamhandler = logging.StreamHandler()
logstreamhandler.setFormatter(streamformater)


def _set_log_level(ctx, param, value):
    if value:
        ctx.ensure_object(dict)
        ctx.obj["log_level"] = value
        logger.setLevel(value)
        if value <= 20:
            logger.info(f"Logger set to {logging.getLevelName(logger.getEffectiveLevel())}")
    return value


@click.group(chain=False)
@click.version_option(version='0.1.0')
@click.pass_context
def cli(ctx):
    """
        Take object ownership of S3 bucket objects.
    """
    ctx.ensure_object(dict)
    ctx.obj["aws_config"] = Config(
        retries={
            'max_attempts': 10,
            'mode': 'standard'
        }
    )


@cli.command("own")
@click.argument("bucket", type=click.STRING)
@click.argument("prefix", type=click.STRING, default="/")
@click.option("--profile", type=click.STRING, default="default", envvar="AWS_DEFAULT_PROFILE", help="Configuration profile from ~/.aws/{credentials,config}")
@click.option("--region", type=click.STRING, default="us-east-1", envvar="AWS_DEFAULT_REGION", help="AWS region")
@click.option("--threads", "-t", type=click.INT, default=40, help="Threads to use")
@click.option("--loglevel", "log_level", hidden=True, flag_value=logging.INFO, callback=_set_log_level, expose_value=False, is_eager=True, default=True)
@click.option("--verbose", "-v", "log_level", flag_value=logging.DEBUG, callback=_set_log_level, expose_value=False, is_eager=True, help="Increase log_level")
@click.pass_context
def command_own(ctx, *args, **kwargs):
    ctx.obj.update(kwargs)
    profile_name = ctx.obj.get("profile")
    region = ctx.obj.get("region")
    bucket = ctx.obj.get("bucket")
    prefix = ctx.obj.get("prefix").lstrip("/")
    threads = ctx.obj.get("threads")
    pool = Pool(nodes=threads)
    logger.addHandler(logstreamhandler)
    logger.info(f"Getting ownership of all objects in s3://{bucket}/{prefix}")
    start = time()

    try:
        SESSION: boto3.Session = boto3.session.Session(profile_name=profile_name)
    except botocore.exceptions.ProfileNotFound as e:
        logger.warning(f"Profile {profile_name} was not found.")
        logger.warning(f"Falling back to environment variables for AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN")
        AWS_ACCESS_KEY_ID = os.environ.get("AWS_ACCESS_KEY_ID", "")
        AWS_SECRET_ACCESS_KEY = os.environ.get("AWS_SECRET_ACCESS_KEY", "")
        AWS_SESSION_TOKEN = os.environ.get("AWS_SESSION_TOKEN", "")
        if AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY:
            if AWS_SESSION_TOKEN:
                SESSION: boto3.Session = boto3.session.Session(aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
                                                               aws_session_token=AWS_SESSION_TOKEN)
            else:
                SESSION: boto3.Session = boto3.session.Session(aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
        else:
            logger.error("Unable to find AWS credentials.")
            sys.exit(1)

    s3c = SESSION.client('s3', config=ctx.obj["aws_config"])

    def bucket_keys(Bucket, Prefix='', StartAfter='', Delimiter='/'):
        Prefix = Prefix[1:] if Prefix.startswith(Delimiter) else Prefix
        if not StartAfter:
            del StartAfter
            if Prefix.endswith(Delimiter):
                StartAfter = Prefix
        del Delimiter
        for page in s3c.get_paginator('list_objects_v2').paginate(Bucket=Bucket, Prefix=Prefix):
            for content in page.get('Contents', ()):
                yield content['Key']

    def worker(key):
        logger.info(f"Processing: {key}")
        s3c.copy_object(Bucket=bucket, Key=key,
                        CopySource={'Bucket': bucket, 'Key': key},
                        ACL='bucket-owner-full-control',
                        StorageClass="STANDARD"
                        )

    object_keys = bucket_keys(bucket, prefix)
    pool.map(worker, object_keys)
    end = time()
    logger.info(f"Completed for {end - start:.2f} seconds.")


if __name__ == '__main__':
    cli()

Usage:

get_object_ownership.py own -v my-big-aws-logs-bucket /prefix

The bucket mentioned above was processed for ~7 hours using 40 threads.

[*] INFO: 2021-08-05 19:53:55,542: Completed for 25320.45 seconds.

Some more speed comparison using AWS cli vs this tool on the same subset of data:

aws s3 cp --recursive --acl bucket-owner-full-control --metadata-directive 53.59s user 7.24s system 20% cpu 5:02.42 total

vs

[*] INFO: 2021-08-06 09:07:43,506: Completed for 49.09 seconds.

The python code is more efficient this way, otherwise it takes a lot longer.

import boto3
import sys

client = boto3.client('s3')
BUCKET='mybucket'

def process_s3_objects(prefix):
    """Get a list of all keys in an S3 bucket."""
    kwargs = {'Bucket': BUCKET, 'Prefix': prefix}
    failures = []
    while_true = True
    while while_true:
      resp = client.list_objects_v2(**kwargs)
      for obj in resp['Contents']:
        try:
            set_acl(obj['Key'])
        except KeyError:
            while_true = False
        except Exception:
            failures.append(obj["Key"])
            continue
      kwargs['ContinuationToken'] = resp['NextContinuationToken']
    print ("failures :"+ failures)

def set_acl(key):
  print(key)
  client.put_object_acl(
    ACL='bucket-owner-full-control',
    Bucket=BUCKET,
    Key=key
)

def get_account_canonical_id():
  return client.list_buckets()["Owner"]["ID"]


process_s3_objects(sys.argv[1])

THE MAIN COMMAND IS THIS.

WHERE bucketname_example_3636 IS YOUR BUCKET NAME.

aws s3api put-object-acl --bucket bucketname_example_3636 --key bigdirectories2_noglacier/bkpy_workspace_sda4.tar --acl bucket-owner-full-control

MY IDEA IS TO CREATE A SCRIPT WITH SED, EASLY.

1. GET THE LIST OF THE KEYS;

aws s3 ls s3://bucketname_example_3636 --recursive > listoffile.txt

2. SAY YOU HAVE 1000 FILES, SO 1000 KEYS;

WITH SED CREATE AUTOMATICALLY 1000 COMMANDS;

THE STRING \\1 IS YOUR KEY;

sed 's/^(.*)$/aws s3api put-object-acl --bucket bucketname_example_3636 --key \\1 --acl bucket-owner-full-control/g' listoffile.txt > listoffile_v2.txt;

3. ADD THE SHEBANG LINE NECESSARY TO CONVERT A TEXTUAL FILE TO BASH SCRIPT;

sed '1 i\\#!/bin/bash' listoffile_v2.txt > listoffile_v3.txt;

4. NOW JUST CHANGE THE FILE EXTENTION;

cp listoffile_v3.txt listoffile_v3.sh;

NOW YOU HAVE A SCRIPT;

MAKE THE SCRIPT EXECUTABLE;

chmod u+x listoffile_v3.sh;

RUN THE SCRIPT

listoffile_v3.sh;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM