繁体   English   中英

如何将文件夹(或整个存储桶)从 Glacier 恢复到 Amazon S3?

[英]How to restore folders (or entire buckets) to Amazon S3 from Glacier?

我在 Amazon S3 上更改了我的一堆存储桶的生命周期,因此它们的存储 class 被设置为 Glacier。 我使用在线 AWS 控制台执行此操作。 我现在又需要那些文件了。

我知道如何将每个文件恢复到 S3。 但是我的存储桶有数千个文件。 我想看看是否有办法将整个桶恢复到 S3,就像有办法将整个桶发送到 Glacier 一样?

我猜想有一种方法可以编写解决方案。 但我想看看是否有办法在控制台中执行此操作。 还是用另一个程序? 或者我可能会遗漏的其他东西?

如果您使用s3cmd您可以使用它非常轻松地递归恢复:

s3cmd restore --recursive s3://mybucketname/ 

我也用它来恢复文件夹:

s3cmd restore --recursive s3://mybucketname/folder/

如果您使用的是AWS CLI 工具(很好,您应该这样做),您可以这样做:

aws s3 ls s3://<BUCKET_NAME> --recursive | awk '{print $4}' | xargs -L 1 aws s3api restore-object --restore-request '{"Days":<DAYS>,"GlacierJobParameters":{"Tier":"<TIER>"}}' --bucket <BUCKET_NAME> --key

<BUCKET_NAME>替换为您想要的存储桶名称,并提供恢复参数<DAYS><TIER>

<DAYS>是您要还原对象的天数, <TIER>控制还原过程的速度,它具有三个级别: Bulk、Standard 或 Expedited

上述答案对我来说效果不佳,因为我的水桶与冰川上的物体混合在一起,而有些则不是。 对我来说,最简单的事情是创建存储桶中所有GLACIER 对象的列表,然后尝试单独恢复每个对象,忽略任何错误(例如已经在进行中,而不是对象等)。

  1. 获取存储桶中所有 GLACIER 文件(密钥)的列表

    aws s3api list-objects-v2 --bucket <bucketName> --query "Contents[?StorageClass=='GLACIER']" --output text | awk '{print $2}' > glacier-restore.txt

  2. 创建一个 shell 脚本并运行它,替换您的“bucketName”。

     #!/bin/sh for x in `cat glacier-restore.txt` do echo "Begin restoring $x" aws s3api restore-object --restore-request Days=7 --bucket <bucketName> --key "$x" echo "Done restoring $x" done

http://capnjosh.com/blog/a-client-error-invalidobjectstate-occurred-when-calling-the-copyobject-operation-operation-is-not-valid-for-the-source-objects上归功于 Josh -storage-class/ ,这是我在尝试上述一些解决方案后找到的资源。

没有为此提供内置工具。 S3 中的“文件夹”是人类方便的一种错觉,基于对象键(路径/文件名)中的正斜杠,并且每个迁移到冰川的对象都必须单独恢复,尽管......

当然,您可以编写一个脚本来遍历层次结构,并使用您选择的编程语言中的 SDK 或 REST API 发送这些恢复请求。

在继续之前,请确保您了解从冰川恢复到 S3 的工作原理。 它始终只是临时恢复,您可以选择每个对象在恢复到仅存储在冰川中之前将在 S3 中保留的天数。

此外,您要确定自己了解在短时间内恢复过多冰川数据的罚款,否则您可能会承担一些意外费用。 根据紧急程度,您可能希望将还原操作分散到数天或数周内。

我最近需要恢复整个存储桶及其所有文件和文件夹。 您需要使用您的凭据配置 s3cmd 和 aws cli 工具来运行它。

我发现这非常强大,可以处理存储桶中可能已经有恢复请求的特定对象的错误。

#!/bin/sh

# This will give you a nice list of all objects in the bucket with the bucket name stripped out
s3cmd ls -r s3://<your-bucket-name> | awk '{print $4}' | sed 's#s3://<your-bucket-name>/##' > glacier-restore.txt

for x in `cat glacier-restore.txt`
do
    echo "restoring $x"
    aws s3api restore-object --restore-request Days=7 --bucket <your-bucket-name> --profile <your-aws-credentials-profile> --key "$x"
done

这是我的aws cli界面版本以及如何从冰川恢复数据。 当要恢复的文件的键包含空格时,我修改了上面的一些示例。

# Parameters
BUCKET="my-bucket" # the bucket you want to restore, no s3:// no slashes
BPATH="path/in/bucket/" # the objects prefix you wish to restore (mind the `/`) 
DAYS=1 # For how many days you wish to restore the data.

# Restore the objects
aws s3 ls s3://${BUCKET}/${BPATH} --recursive | \
awk '{out=""; for(i=4;i<=NF;i++){out=out" "$i}; print out}'| \
xargs -I {} aws s3api restore-object --restore-request Days=${DAYS} \
--bucket ${BUCKET} --key "{}"

看起来 S3 浏览器可以在文件夹级别“从 Glacier 恢复”,但不能在存储桶级别。 唯一的问题是您必须购买专业版。 所以不是最好的解决方案。

Dustin 使用 AWS CLI 的答案的一种变体,但使用递归和管道到 sh 跳过错误(例如,如果某些对象已经请求恢复......)

BUCKET=my-bucket
BPATH=/path/in/bucket
DAYS=1
aws s3 ls s3://$BUCKET$BPATH --recursive | awk '{print $4}' | xargs -L 1 \
 echo aws s3api restore-object --restore-request Days=$DAYS \
 --bucket $BUCKET --key | sh

xargs echo 位生成“aws s3api restore-object”命令列表,通过将其通过管道传输到 sh,您可以继续出错。

注意:Ubuntu 14.04 aws-cli 包很旧。 为了使用--recursive您需要通过 github 安装。

POSTSCRIPT:冰川恢复很快就会出乎意料地昂贵。 根据您的用例,您可能会发现不频繁访问层更合适。 AWS 对不同的层有很好的解释。

这个命令对我有用:

aws s3api list-objects-v2 \
--bucket BUCKET_NAME \
--query "Contents[?StorageClass=='GLACIER']" \
--output text | \
awk -F $'\t' '{print $2}' | \
tr '\n' '\0' | \
xargs -L 1 -0 \
aws s3api restore-object \
--restore-request Days=7 \
--bucket BUCKET_NAME \
--key

专业提示

  • 如果您有很多对象,此命令可能需要很长时间。
  • 不要 CTRL-C / 中断命令,否则您必须等待已处理的对象退出RestoreAlreadyInProgress状态,然后才能重新运行它。 状态转换可能需要几个小时。 如果您需要等待,您将看到此错误消息: An error occurred (RestoreAlreadyInProgress) when calling the RestoreObject operation

我今天经历了这家工厂,并根据上面的答案提出了以下建议,并尝试了 s3cmd。 s3cmd 不适用于混合存储桶(Glacier 和 Standard)。 这将通过两个步骤完成您需要的操作 - 首先创建一个冰川文件列表,然后 ping s3 cli 请求(即使它们已经发生)。 它还将跟踪已请求的内容,以便您可以根据需要重新启动脚本。 注意下面引用的 cut 命令中的 TAB (\\t):

#/bin/sh

bucket="$1"
glacier_file_list="glacier-restore-me-please.txt"
glacier_file_done="glacier-requested-restore-already.txt"

if [ "X${bucket}" = "X" ]
then
  echo "Please supply bucket name as first argument"
  exit 1
fi

aws s3api list-objects-v2 --bucket ${bucket} --query "Contents[?StorageClass=='GLACIER']" --output text |cut -d '\t' -f 2 > ${glacier_file_list}

if $? -ne 0
then
  echo "Failed to fetch list of objects from bucket ${bucket}"
  exit 1
fi

echo "Got list of glacier files from bucket ${bucket}"

while read x
do
  echo "Begin restoring $x"
  aws s3api restore-object --restore-request Days=7 --bucket ${bucket} --key "$x"

  if [ $? -ne 0 ]
  then
    echo "Failed to restore \"$x\""
  else
    echo "Done requested restore of \"$x\""
  fi

  # Log those done
  #
  echo "$x" >> ${glacier_file_done}

done < ${glacier_file_list}

另一种方法是rclone。 这个工具可以同步/复制/推送数据(就像我们可以处理文件一样)。 https://rclone.org/faq/#can-rclone-sync-direct-from-drive-to-s3 (链接示例适用于Google驱动器,但这是agnostique)。 但是正如Michael-sqlbot所说,服务器或容器必须在某处启动同步/备份操作。

我用python编写了一个程序来递归恢复文件夹。 上面的s3cmd命令对我不起作用, awk命令也不起作用。

您可以像这样运行python3 /home/ec2-user/recursive_restore.py -- restore并监视恢复状态使用python3 /home/ec2-user/recursive_restore.py -- status

import argparse
import base64
import json
import os
import sys
from datetime import datetime
from pathlib import Path

import boto3
import pymysql.cursors
import yaml
from botocore.exceptions import ClientError

__author__ = "kyle.bridenstine"


def reportStatuses(
    operation,
    type,
    successOperation,
    folders,
    restoreFinished,
    restoreInProgress,
    restoreNotRequestedYet,
    restoreStatusUnknown,
    skippedFolders,
):
    """
    reportStatuses gives a generic, aggregated report for all operations (Restore, Status, Download)
    """

    report = 'Status Report For "{}" Operation. Of the {} total {}, {} are finished being {}, {} have a restore in progress, {} have not been requested to be restored yet, {} reported an unknown restore status, and {} were asked to be skipped.'.format(
        operation,
        str(len(folders)),
        type,
        str(len(restoreFinished)),
        successOperation,
        str(len(restoreInProgress)),
        str(len(restoreNotRequestedYet)),
        str(len(restoreStatusUnknown)),
        str(len(skippedFolders)),
    )

    if (len(folders) - len(skippedFolders)) == len(restoreFinished):
        print(report)
        print("Success: All {} operations are complete".format(operation))
    else:
        if (len(folders) - len(skippedFolders)) == len(restoreNotRequestedYet):
            print(report)
            print("Attention: No {} operations have been requested".format(operation))
        else:
            print(report)
            print("Attention: Not all {} operations are complete yet".format(operation))


def status(foldersToRestore, restoreTTL):

    s3 = boto3.resource("s3")

    folders = []
    skippedFolders = []

    # Read the list of folders to process
    with open(foldersToRestore, "r") as f:

        for rawS3Path in f.read().splitlines():

            folders.append(rawS3Path)

            s3Bucket = "put-your-bucket-name-here"
            maxKeys = 1000
            # Remove the S3 Bucket Prefix to get just the S3 Path i.e., the S3 Objects prefix and key name
            s3Path = removeS3BucketPrefixFromPath(rawS3Path, s3Bucket)

            # Construct an S3 Paginator that returns pages of S3 Object Keys with the defined prefix
            client = boto3.client("s3")
            paginator = client.get_paginator("list_objects")
            operation_parameters = {"Bucket": s3Bucket, "Prefix": s3Path, "MaxKeys": maxKeys}
            page_iterator = paginator.paginate(**operation_parameters)

            pageCount = 0

            totalS3ObjectKeys = []
            totalS3ObjKeysRestoreFinished = []
            totalS3ObjKeysRestoreInProgress = []
            totalS3ObjKeysRestoreNotRequestedYet = []
            totalS3ObjKeysRestoreStatusUnknown = []

            # Iterate through the pages of S3 Object Keys
            for page in page_iterator:

                for s3Content in page["Contents"]:

                    s3ObjectKey = s3Content["Key"]

                    # Folders show up as Keys but they cannot be restored or downloaded so we just ignore them
                    if s3ObjectKey.endswith("/"):
                        continue

                    totalS3ObjectKeys.append(s3ObjectKey)

                    s3Object = s3.Object(s3Bucket, s3ObjectKey)

                    if s3Object.restore is None:
                        totalS3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
                    elif "true" in s3Object.restore:
                        totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
                    elif "false" in s3Object.restore:
                        totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
                    else:
                        totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)

                pageCount = pageCount + 1

            # Report the total statuses for the folders
            reportStatuses(
                "restore folder " + rawS3Path,
                "files",
                "restored",
                totalS3ObjectKeys,
                totalS3ObjKeysRestoreFinished,
                totalS3ObjKeysRestoreInProgress,
                totalS3ObjKeysRestoreNotRequestedYet,
                totalS3ObjKeysRestoreStatusUnknown,
                [],
            )


def removeS3BucketPrefixFromPath(path, bucket):
    """
    removeS3BucketPrefixFromPath removes "s3a://<bucket name>" or "s3://<bucket name>" from the Path
    """

    s3BucketPrefix1 = "s3a://" + bucket + "/"
    s3BucketPrefix2 = "s3://" + bucket + "/"

    if path.startswith(s3BucketPrefix1):
        # remove one instance of prefix
        return path.replace(s3BucketPrefix1, "", 1)
    elif path.startswith(s3BucketPrefix2):
        # remove one instance of prefix
        return path.replace(s3BucketPrefix2, "", 1)
    else:
        return path


def restore(foldersToRestore, restoreTTL):
    """
    restore initiates a restore request on one or more folders
    """

    print("Restore Operation")

    s3 = boto3.resource("s3")
    bucket = s3.Bucket("put-your-bucket-name-here")

    folders = []
    skippedFolders = []

    # Read the list of folders to process
    with open(foldersToRestore, "r") as f:

        for rawS3Path in f.read().splitlines():

            folders.append(rawS3Path)

            # Skip folders that are commented out of the file
            if "#" in rawS3Path:
                print("Skipping this folder {} since it's commented out with #".format(rawS3Path))
                folders.append(rawS3Path)
                continue
            else:
                print("Restoring folder {}".format(rawS3Path))

            s3Bucket = "put-your-bucket-name-here"
            maxKeys = 1000
            # Remove the S3 Bucket Prefix to get just the S3 Path i.e., the S3 Objects prefix and key name
            s3Path = removeS3BucketPrefixFromPath(rawS3Path, s3Bucket)

            print("s3Bucket={}, s3Path={}, maxKeys={}".format(s3Bucket, s3Path, maxKeys))

            # Construct an S3 Paginator that returns pages of S3 Object Keys with the defined prefix
            client = boto3.client("s3")
            paginator = client.get_paginator("list_objects")
            operation_parameters = {"Bucket": s3Bucket, "Prefix": s3Path, "MaxKeys": maxKeys}
            page_iterator = paginator.paginate(**operation_parameters)

            pageCount = 0

            totalS3ObjectKeys = []
            totalS3ObjKeysRestoreFinished = []
            totalS3ObjKeysRestoreInProgress = []
            totalS3ObjKeysRestoreNotRequestedYet = []
            totalS3ObjKeysRestoreStatusUnknown = []

            # Iterate through the pages of S3 Object Keys
            for page in page_iterator:

                print("Processing S3 Key Page {}".format(str(pageCount)))

                s3ObjectKeys = []
                s3ObjKeysRestoreFinished = []
                s3ObjKeysRestoreInProgress = []
                s3ObjKeysRestoreNotRequestedYet = []
                s3ObjKeysRestoreStatusUnknown = []

                for s3Content in page["Contents"]:

                    print("Processing S3 Object Key {}".format(s3Content["Key"]))

                    s3ObjectKey = s3Content["Key"]

                    # Folders show up as Keys but they cannot be restored or downloaded so we just ignore them
                    if s3ObjectKey.endswith("/"):
                        print("Skipping this S3 Object Key because it's a folder {}".format(s3ObjectKey))
                        continue

                    s3ObjectKeys.append(s3ObjectKey)
                    totalS3ObjectKeys.append(s3ObjectKey)

                    s3Object = s3.Object(s3Bucket, s3ObjectKey)

                    print("{} - {} - {}".format(s3Object.key, s3Object.storage_class, s3Object.restore))

                    # Ensure this folder was not already processed for a restore
                    if s3Object.restore is None:

                        restore_response = bucket.meta.client.restore_object(
                            Bucket=s3Object.bucket_name, Key=s3Object.key, RestoreRequest={"Days": restoreTTL}
                        )

                        print("Restore Response: {}".format(str(restore_response)))

                        # Refresh object and check that the restore request was successfully processed
                        s3Object = s3.Object(s3Bucket, s3ObjectKey)

                        print("{} - {} - {}".format(s3Object.key, s3Object.storage_class, s3Object.restore))

                        if s3Object.restore is None:
                            s3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
                            totalS3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
                            print("%s restore request failed" % s3Object.key)
                            # Instead of failing the entire job continue restoring the rest of the log tree(s)
                            # raise Exception("%s restore request failed" % s3Object.key)
                        elif "true" in s3Object.restore:
                            print(
                                "The request to restore this file has been successfully received and is being processed: {}".format(
                                    s3Object.key
                                )
                            )
                            s3ObjKeysRestoreInProgress.append(s3ObjectKey)
                            totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
                        elif "false" in s3Object.restore:
                            print("This file has successfully been restored: {}".format(s3Object.key))
                            s3ObjKeysRestoreFinished.append(s3ObjectKey)
                            totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
                        else:
                            print(
                                "Unknown restore status ({}) for file: {}".format(s3Object.restore, s3Object.key)
                            )
                            s3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
                            totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)

                    elif "true" in s3Object.restore:
                        print("Restore request already received for {}".format(s3Object.key))
                        s3ObjKeysRestoreInProgress.append(s3ObjectKey)
                        totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
                    elif "false" in s3Object.restore:
                        print("This file has successfully been restored: {}".format(s3Object.key))
                        s3ObjKeysRestoreFinished.append(s3ObjectKey)
                        totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
                    else:
                        print(
                            "Unknown restore status ({}) for file: {}".format(s3Object.restore, s3Object.key)
                        )
                        s3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
                        totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)

                # Report the statuses per S3 Key Page
                reportStatuses(
                    "folder-" + rawS3Path + "-page-" + str(pageCount),
                    "files in this page",
                    "restored",
                    s3ObjectKeys,
                    s3ObjKeysRestoreFinished,
                    s3ObjKeysRestoreInProgress,
                    s3ObjKeysRestoreNotRequestedYet,
                    s3ObjKeysRestoreStatusUnknown,
                    [],
                )

                pageCount = pageCount + 1

            if pageCount > 1:
                # Report the total statuses for the files
                reportStatuses(
                    "restore-folder-" + rawS3Path,
                    "files",
                    "restored",
                    totalS3ObjectKeys,
                    totalS3ObjKeysRestoreFinished,
                    totalS3ObjKeysRestoreInProgress,
                    totalS3ObjKeysRestoreNotRequestedYet,
                    totalS3ObjKeysRestoreStatusUnknown,
                    [],
                )


def displayError(operation, exc):
    """
    displayError displays a generic error message for all failed operation's returned exceptions
    """

    print(
        'Error! Restore{} failed. Please ensure that you ran the following command "./tools/infra auth refresh" before executing this program. Error: {}'.format(
            operation, exc
        )
    )


def main(operation, foldersToRestore, restoreTTL):
    """
    main The starting point of the code that directs the operation to it's appropriate workflow
    """

    print(
        "{} Starting log_migration_restore.py with operation={} foldersToRestore={} restoreTTL={} Day(s)".format(
            str(datetime.now().strftime("%d/%m/%Y %H:%M:%S")), operation, foldersToRestore, str(restoreTTL)
        )
    )

    if operation == "restore":
        try:
            restore(foldersToRestore, restoreTTL)
        except Exception as exc:
            displayError("", exc)
    elif operation == "status":
        try:
            status(foldersToRestore, restoreTTL)
        except Exception as exc:
            displayError("-Status-Check", exc)
    else:
        raise Exception("%s is an invalid operation. Please choose either 'restore' or 'status'" % operation)


def check_operation(operation):
    """
    check_operation validates the runtime input arguments
    """

    if operation is None or (
        str(operation) != "restore" and str(operation) != "status" and str(operation) != "download"
    ):
        raise argparse.ArgumentTypeError(
            "%s is an invalid operation. Please choose either 'restore' or 'status' or 'download'" % operation
        )
    return str(operation)


# To run use sudo python3 /home/ec2-user/recursive_restore.py -- restore
# -l /home/ec2-user/folders_to_restore.csv
if __name__ == "__main__":

    # Form the argument parser.
    parser = argparse.ArgumentParser(
        description="Restore s3 folders from archival using 'restore' or check on the restore status using 'status'"
    )

    parser.add_argument(
        "operation",
        type=check_operation,
        help="Please choose either 'restore' to restore the list of s3 folders or 'status' to see the status of a restore on the list of s3 folders",
    )

    parser.add_argument(
        "-l",
        "--foldersToRestore",
        type=str,
        default="/home/ec2-user/folders_to_restore.csv",
        required=False,
        help="The location of the file containing the list of folders to restore. Put one folder on each line.",
    )

    parser.add_argument(
        "-t",
        "--restoreTTL",
        type=int,
        default=30,
        required=False,
        help="The number of days you want the filess to remain restored/unarchived. After this period the logs will automatically be rearchived.",
    )

    args = parser.parse_args()
    sys.exit(main(args.operation, args.foldersToRestore, args.restoreTTL))

也许我只是晚了十年才发布答案,但现在我们有 S3 批处理操作来批量恢复深度归档对象。 看到这个

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM