简体   繁体   English

aws s3 恢复文件夹的所有文件

[英]aws s3 restore all files of a folder

I have files archived on aws s3 glacier deep archive.我将文件存档在 aws s3 冰川深度存档中。 I want to initiate the restoration of all objects starting with a prefix.我想启动以前缀开头的所有对象的恢复。

For that i first try to use de aws cli with this command:为此,我首先尝试将 de aws cli 与此命令一起使用:

aws s3api list-objects-v2 \
--bucket ${bucket} \
--prefix "${prefix}" \
--query "Contents[?StorageClass=='DEEP_ARCHIVE'].Key" \
--output text \
| sed 's/\t/\n/g' \
| xargs -I %%% \
aws s3api restore-object \
--restore-request Days=${days},GlacierJobParameters={"Tier"=\""${mode}"\"} \
--bucket ${bucket} \
--key "%%%"

I don't know why but some objects have initiated a restoration but others (the majority) have not.我不知道为什么,但有些对象已经开始恢复,但其他对象(大多数)没有。

So then i try to use python with the following code:那么我尝试使用 python 和以下代码:

def restore_object(bucket,prefix,days,tier):
    s3 = boto3.resource('s3')
    client = boto3.client('s3')

    my_bucket = s3.Bucket(bucket)

    logfile = open("restoration.log","w")

    for object in my_bucket.objects.filter(Prefix=prefix):
        if object.storage_class == "DEEP_ARCHIVE":
            try:
                resp = client.restore_object(
                Bucket=bucket,
                Key=object.key,
                RestoreRequest={
                    'Days' : days,
                    'GlacierJobParameters' : {'Tier' : tier}
                    }
                )
                
            except Exception as e:
                logfile.write(f'For the object {object.key}, {e} \n')

But it's very long.但是它很长。 4 hours after the script is still running and many objects have still not initiated the restoration.脚本仍在运行 4 小时后,许多对象仍未启动恢复。 There are about 70 000 objects in this folder.此文件夹中大约有 70 000 个对象。

As @John suggested i finally used a batch operation.正如@John 建议的那样,我终于使用了批处理操作。 For people interested here the code:对于这里感兴趣的人代码:

#!/usr/bin/env python3.9

import argparse
from urllib.parse import quote_plus
import os 
import boto3



def get_arguments():
    parser = argparse.ArgumentParser(
        description='''
        Restoration of objects from glacier deep archive on AWS s3

        This script create a manifest file with all object to restore
        and  upload this manifest to s3.

        Then a s3 batch job is created and run

        ''',
        formatter_class=argparse.RawTextHelpFormatter,
        usage='use "%(prog)s --help" for more information',)
    parser.add_argument(
        '--bucket',
        nargs=1,
        help='bucket name (default: %(default)s)',
        default='my-bucket',
        required=False)
    parser.add_argument(
        '--prefix',
        type=str,
        help='<Required> path of the folder to restore (without the name of the bucket)',
        required=True)
    parser.add_argument(
        '--days',
        type=int,
        help='number of days before deletion of the restored object copy (default: %(default)s)',
        default=2,
        required=False)
    parser.add_argument(
        '--mode',
        choices=['STANDARD','BULK'],
        default='STANDARD',
        help='''
        Acces tier option.
        Standard = restoration in 12h 
        Bulk = restoration in 48h 
        (default: %(default)s)
        ''',
        required=False)
    return parser.parse_args()






def create_manifest(bucket,prefix):
    '''Create a manifest file with all object to restore and upload this manifest to s3

    Parameters
    ----------
    bucket : str
        name of the bucket
    prefix : str
        path of the folder to restore (without the name of the bucket)

    Returns
    -------
    manifest_object
        object of the manifest file
    
    '''


    s3 = boto3.resource('s3')
    

    my_bucket = s3.Bucket(bucket)

    prefix_file_name = prefix.replace("/","_") if "/" in prefix else prefix

    manifest = open(prefix_file_name, "w")
    logfile = open("restoration.log","w")

    for object in my_bucket.objects.filter(Prefix=prefix):
        if object.storage_class == "DEEP_ARCHIVE":
            try:
                key_url_encode = quote_plus(object.key, safe = '/')
                manifest.write(f'{bucket},{key_url_encode}\n')
                
            except Exception as e:
                logfile.write(f'For the object {object.key}, {e} \n')

    manifest.close()




    if prefix.endswith('/'):
        manifest_key = prefix + 'Manifest2/Manifest.csv'
    else:
        manifest_key = prefix + '/Manifest2/Manifest.csv'

    my_bucket.upload_file(
        Filename=prefix_file_name,
        Key=manifest_key
        )

    os.remove(prefix_file_name)

    return s3.Object(bucket,manifest_key)


def create_job(manifest_object,bucket,days,tiers,prefix):
    '''Create a s3 batch job

    Parameters
    ----------
    manifest_object : object
        object of the manifest file
    bucket : str
        name of the bucket
    days : int
        number of days before deletion of the restored object copy
    tiers : str
        Acces tier option.
    prefix : str
        path of the folder to restore (without the name of the bucket)

    Returns
    -------
    None

    '''

    clientjobs = boto3.client('s3control')

    manifest_arn = 'arn:aws:s3:::' + bucket + '/' + manifest_object.key

    bucket_arn = 'arn:aws:s3:::' + bucket 

    if prefix.endswith('/'):
        report_key = prefix + 'report_batch_jobs'
    else:
        report_key = prefix + '/report_batch_jobs'

    response = clientjobs.create_job(
        AccountId='myaccountid',
        ConfirmationRequired=False,
        Operation={
            'S3InitiateRestoreObject': {
                'ExpirationInDays': days,
                'GlacierJobTier': tiers
                }
            },
        Report={
        'Bucket': bucket_arn,
        'Format': 'Report_CSV_20180820',
        'Enabled': True,
        'Prefix': report_key,
        'ReportScope': 'FailedTasksOnly'
        },
        Manifest={
            'Spec': {
                'Format': 'S3BatchOperations_CSV_20180820',
                'Fields': [
                    'Bucket','Key'
                    ]
        },
            'Location': {
                'ObjectArn': manifest_arn,
                'ETag': manifest_object.e_tag
        }
            },
        Priority=1,
        RoleArn='arnofiamrole'
        )




def main():
    bucket = get_arguments().bucket
    prefix = get_arguments().prefix

    manifest_object = create_manifest(bucket,prefix)

    days = get_arguments().days
    tiers = get_arguments().mode

    create_job(manifest_object,bucket,days,tiers,prefix)

main()

i just remove personal information (account id and iam role ARN)我只是删除个人信息(帐户 ID 和 iam 角色 ARN)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 验证 s3 文件夹中的所有 cloudformation 文件 - Validate all cloudformation files in a s3 folder 获取s3文件夹中所有s3文件的Object URL - Get Object URL of s3 all s3 files in a folder 使用 elixir 中的 ex_aws 库将文件从一个文件夹移动到 aws S3 中的另一个文件夹 - Move files from one folder to another folder in aws S3 with ex_aws library in elixir AWS S3 列出具有特定内容类型的所有文件 - AWS S3 list all files with specific content type AWS Glue - 将所有 S3 JSON 文件组合成具有大小限制的 S3 Parquet 文件 - AWS Glue - Combining all S3 JSON files into S3 Parquet files with a size limit AWS:使用 lambda function 读取 Amazon S3 存储桶中的所有文件 - AWS: Reading all files in an Amazon S3 bucket with a lambda function 如何在脚本 elixir 中将文件夹的所有文件从另一个文件夹移动到同一个 S3 存储桶 - How to move all files of folder from another folder to same S3 bucket in script elixir 如何通过一些简短的方式将所有文件和文件夹从一个文件夹移动到另一个 php 中的 S3 存储桶? - How to move all files and folder from one folder to another of S3 bucket in php with some short way? AWS S3 策略限制文件夹删除 - AWS S3 policy restrict folder delete 从 S3 存储桶中的文件夹中删除文件 - Delete files from folder in S3 bucket
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM