如何将文件夹（或整个存储桶）从 Glacier 恢复到 Amazon S3？

Question

我在 Amazon S3 上更改了我的一堆存储桶的生命周期，因此它们的存储 class 被设置为 Glacier。 我使用在线 AWS 控制台执行此操作。 我现在又需要那些文件了。

我知道如何将每个文件恢复到 S3。 但是我的存储桶有数千个文件。 我想看看是否有办法将整个桶恢复到 S3，就像有办法将整个桶发送到 Glacier 一样？

我猜想有一种方法可以编写解决方案。 但我想看看是否有办法在控制台中执行此操作。 还是用另一个程序？ 或者我可能会遗漏的其他东西？

Answer 1

如果您使用s3cmd您可以使用它非常轻松地递归恢复：

s3cmd restore --recursive s3://mybucketname/

我也用它来恢复文件夹：

s3cmd restore --recursive s3://mybucketname/folder/

Answer 2

如果您使用的是AWS CLI 工具（很好，您应该这样做），您可以这样做：

aws s3 ls s3://<BUCKET_NAME> --recursive | awk '{print $4}' | xargs -L 1 aws s3api restore-object --restore-request '{"Days":<DAYS>,"GlacierJobParameters":{"Tier":"<TIER>"}}' --bucket <BUCKET_NAME> --key

将<BUCKET_NAME>替换为您想要的存储桶名称，并提供恢复参数<DAYS>和<TIER> 。

<DAYS>是您要还原对象的天数， <TIER>控制还原过程的速度，它具有三个级别： Bulk、Standard 或 Expedited ：

Answer 3

上述答案对我来说效果不佳，因为我的水桶与冰川上的物体混合在一起，而有些则不是。 对我来说，最简单的事情是创建存储桶中所有GLACIER 对象的列表，然后尝试单独恢复每个对象，忽略任何错误（例如已经在进行中，而不是对象等）。

获取存储桶中所有 GLACIER 文件（密钥）的列表
aws s3api list-objects-v2 --bucket <bucketName> --query "Contents[?StorageClass=='GLACIER']" --output text | awk '{print $2}' > glacier-restore.txt

创建一个 shell 脚本并运行它，替换您的“bucketName”。

 #!/bin/sh for x in `cat glacier-restore.txt` do echo "Begin restoring $x" aws s3api restore-object --restore-request Days=7 --bucket <bucketName> --key "$x" echo "Done restoring $x" done

在http://capnjosh.com/blog/a-client-error-invalidobjectstate-occurred-when-calling-the-copyobject-operation-operation-is-not-valid-for-the-source-objects上归功于 Josh -storage-class/ ，这是我在尝试上述一些解决方案后找到的资源。

Answer 4

没有为此提供内置工具。 S3 中的“文件夹”是人类方便的一种错觉，基于对象键（路径/文件名）中的正斜杠，并且每个迁移到冰川的对象都必须单独恢复，尽管......

当然，您可以编写一个脚本来遍历层次结构，并使用您选择的编程语言中的 SDK 或 REST API 发送这些恢复请求。

在继续之前，请确保您了解从冰川恢复到 S3 的工作原理。 它始终只是临时恢复，您可以选择每个对象在恢复到仅存储在冰川中之前将在 S3 中保留的天数。

此外，您要确定自己了解在短时间内恢复过多冰川数据的罚款，否则您可能会承担一些意外费用。 根据紧急程度，您可能希望将还原操作分散到数天或数周内。

Answer 5

我最近需要恢复整个存储桶及其所有文件和文件夹。 您需要使用您的凭据配置 s3cmd 和 aws cli 工具来运行它。

我发现这非常强大，可以处理存储桶中可能已经有恢复请求的特定对象的错误。

#!/bin/sh

# This will give you a nice list of all objects in the bucket with the bucket name stripped out
s3cmd ls -r s3://<your-bucket-name> | awk '{print $4}' | sed 's#s3://<your-bucket-name>/##' > glacier-restore.txt

for x in `cat glacier-restore.txt`
do
    echo "restoring $x"
    aws s3api restore-object --restore-request Days=7 --bucket <your-bucket-name> --profile <your-aws-credentials-profile> --key "$x"
done

Answer 6

这是我的aws cli界面版本以及如何从冰川恢复数据。 当要恢复的文件的键包含空格时，我修改了上面的一些示例。

# Parameters
BUCKET="my-bucket" # the bucket you want to restore, no s3:// no slashes
BPATH="path/in/bucket/" # the objects prefix you wish to restore (mind the `/`) 
DAYS=1 # For how many days you wish to restore the data.

# Restore the objects
aws s3 ls s3://${BUCKET}/${BPATH} --recursive | \
awk '{out=""; for(i=4;i<=NF;i++){out=out" "$i}; print out}'| \
xargs -I {} aws s3api restore-object --restore-request Days=${DAYS} \
--bucket ${BUCKET} --key "{}"

Answer 7

看起来 S3 浏览器可以在文件夹级别“从 Glacier 恢复”，但不能在存储桶级别。 唯一的问题是您必须购买专业版。 所以不是最好的解决方案。

Answer 8

Dustin 使用 AWS CLI 的答案的一种变体，但使用递归和管道到 sh 跳过错误（例如，如果某些对象已经请求恢复......）

BUCKET=my-bucket
BPATH=/path/in/bucket
DAYS=1
aws s3 ls s3://$BUCKET$BPATH --recursive | awk '{print $4}' | xargs -L 1 \
 echo aws s3api restore-object --restore-request Days=$DAYS \
 --bucket $BUCKET --key | sh

xargs echo 位生成“aws s3api restore-object”命令列表，通过将其通过管道传输到 sh，您可以继续出错。

注意：Ubuntu 14.04 aws-cli 包很旧。 为了使用--recursive您需要通过 github 安装。

POSTSCRIPT：冰川恢复很快就会出乎意料地昂贵。 根据您的用例，您可能会发现不频繁访问层更合适。 AWS 对不同的层有很好的解释。

Answer 9

这个命令对我有用：

aws s3api list-objects-v2 \
--bucket BUCKET_NAME \
--query "Contents[?StorageClass=='GLACIER']" \
--output text | \
awk -F $'\t' '{print $2}' | \
tr '\n' '\0' | \
xargs -L 1 -0 \
aws s3api restore-object \
--restore-request Days=7 \
--bucket BUCKET_NAME \
--key

专业提示

如果您有很多对象，此命令可能需要很长时间。
不要 CTRL-C / 中断命令，否则您必须等待已处理的对象退出RestoreAlreadyInProgress状态，然后才能重新运行它。 状态转换可能需要几个小时。 如果您需要等待，您将看到此错误消息： An error occurred (RestoreAlreadyInProgress) when calling the RestoreObject operation

Answer 10

我今天经历了这家工厂，并根据上面的答案提出了以下建议，并尝试了 s3cmd。 s3cmd 不适用于混合存储桶（Glacier 和 Standard）。 这将通过两个步骤完成您需要的操作 - 首先创建一个冰川文件列表，然后 ping s3 cli 请求（即使它们已经发生）。 它还将跟踪已请求的内容，以便您可以根据需要重新启动脚本。 注意下面引用的 cut 命令中的 TAB (\\t)：

#/bin/sh

bucket="$1"
glacier_file_list="glacier-restore-me-please.txt"
glacier_file_done="glacier-requested-restore-already.txt"

if [ "X${bucket}" = "X" ]
then
  echo "Please supply bucket name as first argument"
  exit 1
fi

aws s3api list-objects-v2 --bucket ${bucket} --query "Contents[?StorageClass=='GLACIER']" --output text |cut -d '\t' -f 2 > ${glacier_file_list}

if $? -ne 0
then
  echo "Failed to fetch list of objects from bucket ${bucket}"
  exit 1
fi

echo "Got list of glacier files from bucket ${bucket}"

while read x
do
  echo "Begin restoring $x"
  aws s3api restore-object --restore-request Days=7 --bucket ${bucket} --key "$x"

  if [ $? -ne 0 ]
  then
    echo "Failed to restore \"$x\""
  else
    echo "Done requested restore of \"$x\""
  fi

  # Log those done
  #
  echo "$x" >> ${glacier_file_done}

done < ${glacier_file_list}

Answer 11

另一种方法是rclone。 这个工具可以同步/复制/推送数据（就像我们可以处理文件一样）。 https://rclone.org/faq/#can-rclone-sync-direct-from-drive-to-s3 （链接示例适用于Google驱动器，但这是agnostique）。 但是正如Michael-sqlbot所说，服务器或容器必须在某处启动同步/备份操作。

Answer 12

我用python编写了一个程序来递归恢复文件夹。 上面的s3cmd命令对我不起作用， awk命令也不起作用。

您可以像这样运行python3 /home/ec2-user/recursive_restore.py -- restore并监视恢复状态使用python3 /home/ec2-user/recursive_restore.py -- status

import argparse
import base64
import json
import os
import sys
from datetime import datetime
from pathlib import Path

import boto3
import pymysql.cursors
import yaml
from botocore.exceptions import ClientError

__author__ = "kyle.bridenstine"


def reportStatuses(
    operation,
    type,
    successOperation,
    folders,
    restoreFinished,
    restoreInProgress,
    restoreNotRequestedYet,
    restoreStatusUnknown,
    skippedFolders,
):
    """
    reportStatuses gives a generic, aggregated report for all operations (Restore, Status, Download)
    """

    report = 'Status Report For "{}" Operation. Of the {} total {}, {} are finished being {}, {} have a restore in progress, {} have not been requested to be restored yet, {} reported an unknown restore status, and {} were asked to be skipped.'.format(
        operation,
        str(len(folders)),
        type,
        str(len(restoreFinished)),
        successOperation,
        str(len(restoreInProgress)),
        str(len(restoreNotRequestedYet)),
        str(len(restoreStatusUnknown)),
        str(len(skippedFolders)),
    )

    if (len(folders) - len(skippedFolders)) == len(restoreFinished):
        print(report)
        print("Success: All {} operations are complete".format(operation))
    else:
        if (len(folders) - len(skippedFolders)) == len(restoreNotRequestedYet):
            print(report)
            print("Attention: No {} operations have been requested".format(operation))
        else:
            print(report)
            print("Attention: Not all {} operations are complete yet".format(operation))


def status(foldersToRestore, restoreTTL):

    s3 = boto3.resource("s3")

    folders = []
    skippedFolders = []

    # Read the list of folders to process
    with open(foldersToRestore, "r") as f:

        for rawS3Path in f.read().splitlines():

            folders.append(rawS3Path)

            s3Bucket = "put-your-bucket-name-here"
            maxKeys = 1000
            # Remove the S3 Bucket Prefix to get just the S3 Path i.e., the S3 Objects prefix and key name
            s3Path = removeS3BucketPrefixFromPath(rawS3Path, s3Bucket)

            # Construct an S3 Paginator that returns pages of S3 Object Keys with the defined prefix
            client = boto3.client("s3")
            paginator = client.get_paginator("list_objects")
            operation_parameters = {"Bucket": s3Bucket, "Prefix": s3Path, "MaxKeys": maxKeys}
            page_iterator = paginator.paginate(**operation_parameters)

            pageCount = 0

            totalS3ObjectKeys = []
            totalS3ObjKeysRestoreFinished = []
            totalS3ObjKeysRestoreInProgress = []
            totalS3ObjKeysRestoreNotRequestedYet = []
            totalS3ObjKeysRestoreStatusUnknown = []

            # Iterate through the pages of S3 Object Keys
            for page in page_iterator:

                for s3Content in page["Contents"]:

                    s3ObjectKey = s3Content["Key"]

                    # Folders show up as Keys but they cannot be restored or downloaded so we just ignore them
                    if s3ObjectKey.endswith("/"):
                        continue

                    totalS3ObjectKeys.append(s3ObjectKey)

                    s3Object = s3.Object(s3Bucket, s3ObjectKey)

                    if s3Object.restore is None:
                        totalS3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
                    elif "true" in s3Object.restore:
                        totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
                    elif "false" in s3Object.restore:
                        totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
                    else:
                        totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)

                pageCount = pageCount + 1

            # Report the total statuses for the folders
            reportStatuses(
                "restore folder " + rawS3Path,
                "files",
                "restored",
                totalS3ObjectKeys,
                totalS3ObjKeysRestoreFinished,
                totalS3ObjKeysRestoreInProgress,
                totalS3ObjKeysRestoreNotRequestedYet,
                totalS3ObjKeysRestoreStatusUnknown,
                [],
            )


def removeS3BucketPrefixFromPath(path, bucket):
    """
    removeS3BucketPrefixFromPath removes "s3a://<bucket name>" or "s3://<bucket name>" from the Path
    """

    s3BucketPrefix1 = "s3a://" + bucket + "/"
    s3BucketPrefix2 = "s3://" + bucket + "/"

    if path.startswith(s3BucketPrefix1):
        # remove one instance of prefix
        return path.replace(s3BucketPrefix1, "", 1)
    elif path.startswith(s3BucketPrefix2):
        # remove one instance of prefix
        return path.replace(s3BucketPrefix2, "", 1)
    else:
        return path


def restore(foldersToRestore, restoreTTL):
    """
    restore initiates a restore request on one or more folders
    """

    print("Restore Operation")

    s3 = boto3.resource("s3")
    bucket = s3.Bucket("put-your-bucket-name-here")

    folders = []
    skippedFolders = []

    # Read the list of folders to process
    with open(foldersToRestore, "r") as f:

        for rawS3Path in f.read().splitlines():

            folders.append(rawS3Path)

            # Skip folders that are commented out of the file
            if "#" in rawS3Path:
                print("Skipping this folder {} since it's commented out with #".format(rawS3Path))
                folders.append(rawS3Path)
                continue
            else:
                print("Restoring folder {}".format(rawS3Path))

            s3Bucket = "put-your-bucket-name-here"
            maxKeys = 1000
            # Remove the S3 Bucket Prefix to get just the S3 Path i.e., the S3 Objects prefix and key name
            s3Path = removeS3BucketPrefixFromPath(rawS3Path, s3Bucket)

            print("s3Bucket={}, s3Path={}, maxKeys={}".format(s3Bucket, s3Path, maxKeys))

            # Construct an S3 Paginator that returns pages of S3 Object Keys with the defined prefix
            client = boto3.client("s3")
            paginator = client.get_paginator("list_objects")
            operation_parameters = {"Bucket": s3Bucket, "Prefix": s3Path, "MaxKeys": maxKeys}
            page_iterator = paginator.paginate(**operation_parameters)

            pageCount = 0

            totalS3ObjectKeys = []
            totalS3ObjKeysRestoreFinished = []
            totalS3ObjKeysRestoreInProgress = []
            totalS3ObjKeysRestoreNotRequestedYet = []
            totalS3ObjKeysRestoreStatusUnknown = []

            # Iterate through the pages of S3 Object Keys
            for page in page_iterator:

                print("Processing S3 Key Page {}".format(str(pageCount)))

                s3ObjectKeys = []
                s3ObjKeysRestoreFinished = []
                s3ObjKeysRestoreInProgress = []
                s3ObjKeysRestoreNotRequestedYet = []
                s3ObjKeysRestoreStatusUnknown = []

                for s3Content in page["Contents"]:

                    print("Processing S3 Object Key {}".format(s3Content["Key"]))

                    s3ObjectKey = s3Content["Key"]

                    # Folders show up as Keys but they cannot be restored or downloaded so we just ignore them
                    if s3ObjectKey.endswith("/"):
                        print("Skipping this S3 Object Key because it's a folder {}".format(s3ObjectKey))
                        continue

                    s3ObjectKeys.append(s3ObjectKey)
                    totalS3ObjectKeys.append(s3ObjectKey)

                    s3Object = s3.Object(s3Bucket, s3ObjectKey)

                    print("{} - {} - {}".format(s3Object.key, s3Object.storage_class, s3Object.restore))

                    # Ensure this folder was not already processed for a restore
                    if s3Object.restore is None:

                        restore_response = bucket.meta.client.restore_object(
                            Bucket=s3Object.bucket_name, Key=s3Object.key, RestoreRequest={"Days": restoreTTL}
                        )

                        print("Restore Response: {}".format(str(restore_response)))

                        # Refresh object and check that the restore request was successfully processed
                        s3Object = s3.Object(s3Bucket, s3ObjectKey)

                        print("{} - {} - {}".format(s3Object.key, s3Object.storage_class, s3Object.restore))

                        if s3Object.restore is None:
                            s3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
                            totalS3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
                            print("%s restore request failed" % s3Object.key)
                            # Instead of failing the entire job continue restoring the rest of the log tree(s)
                            # raise Exception("%s restore request failed" % s3Object.key)
                        elif "true" in s3Object.restore:
                            print(
                                "The request to restore this file has been successfully received and is being processed: {}".format(
                                    s3Object.key
                                )
                            )
                            s3ObjKeysRestoreInProgress.append(s3ObjectKey)
                            totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
                        elif "false" in s3Object.restore:
                            print("This file has successfully been restored: {}".format(s3Object.key))
                            s3ObjKeysRestoreFinished.append(s3ObjectKey)
                            totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
                        else:
                            print(
                                "Unknown restore status ({}) for file: {}".format(s3Object.restore, s3Object.key)
                            )
                            s3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
                            totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)

                    elif "true" in s3Object.restore:
                        print("Restore request already received for {}".format(s3Object.key))
                        s3ObjKeysRestoreInProgress.append(s3ObjectKey)
                        totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
                    elif "false" in s3Object.restore:
                        print("This file has successfully been restored: {}".format(s3Object.key))
                        s3ObjKeysRestoreFinished.append(s3ObjectKey)
                        totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
                    else:
                        print(
                            "Unknown restore status ({}) for file: {}".format(s3Object.restore, s3Object.key)
                        )
                        s3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
                        totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)

                # Report the statuses per S3 Key Page
                reportStatuses(
                    "folder-" + rawS3Path + "-page-" + str(pageCount),
                    "files in this page",
                    "restored",
                    s3ObjectKeys,
                    s3ObjKeysRestoreFinished,
                    s3ObjKeysRestoreInProgress,
                    s3ObjKeysRestoreNotRequestedYet,
                    s3ObjKeysRestoreStatusUnknown,
                    [],
                )

                pageCount = pageCount + 1

            if pageCount > 1:
                # Report the total statuses for the files
                reportStatuses(
                    "restore-folder-" + rawS3Path,
                    "files",
                    "restored",
                    totalS3ObjectKeys,
                    totalS3ObjKeysRestoreFinished,
                    totalS3ObjKeysRestoreInProgress,
                    totalS3ObjKeysRestoreNotRequestedYet,
                    totalS3ObjKeysRestoreStatusUnknown,
                    [],
                )


def displayError(operation, exc):
    """
    displayError displays a generic error message for all failed operation's returned exceptions
    """

    print(
        'Error! Restore{} failed. Please ensure that you ran the following command "./tools/infra auth refresh" before executing this program. Error: {}'.format(
            operation, exc
        )
    )


def main(operation, foldersToRestore, restoreTTL):
    """
    main The starting point of the code that directs the operation to it's appropriate workflow
    """

    print(
        "{} Starting log_migration_restore.py with operation={} foldersToRestore={} restoreTTL={} Day(s)".format(
            str(datetime.now().strftime("%d/%m/%Y %H:%M:%S")), operation, foldersToRestore, str(restoreTTL)
        )
    )

    if operation == "restore":
        try:
            restore(foldersToRestore, restoreTTL)
        except Exception as exc:
            displayError("", exc)
    elif operation == "status":
        try:
            status(foldersToRestore, restoreTTL)
        except Exception as exc:
            displayError("-Status-Check", exc)
    else:
        raise Exception("%s is an invalid operation. Please choose either 'restore' or 'status'" % operation)


def check_operation(operation):
    """
    check_operation validates the runtime input arguments
    """

    if operation is None or (
        str(operation) != "restore" and str(operation) != "status" and str(operation) != "download"
    ):
        raise argparse.ArgumentTypeError(
            "%s is an invalid operation. Please choose either 'restore' or 'status' or 'download'" % operation
        )
    return str(operation)


# To run use sudo python3 /home/ec2-user/recursive_restore.py -- restore
# -l /home/ec2-user/folders_to_restore.csv
if __name__ == "__main__":

    # Form the argument parser.
    parser = argparse.ArgumentParser(
        description="Restore s3 folders from archival using 'restore' or check on the restore status using 'status'"
    )

    parser.add_argument(
        "operation",
        type=check_operation,
        help="Please choose either 'restore' to restore the list of s3 folders or 'status' to see the status of a restore on the list of s3 folders",
    )

    parser.add_argument(
        "-l",
        "--foldersToRestore",
        type=str,
        default="/home/ec2-user/folders_to_restore.csv",
        required=False,
        help="The location of the file containing the list of folders to restore. Put one folder on each line.",
    )

    parser.add_argument(
        "-t",
        "--restoreTTL",
        type=int,
        default=30,
        required=False,
        help="The number of days you want the filess to remain restored/unarchived. After this period the logs will automatically be rearchived.",
    )

    args = parser.parse_args()
    sys.exit(main(args.operation, args.foldersToRestore, args.restoreTTL))

Answer 13

也许我只是晚了十年才发布答案，但现在我们有 S3 批处理操作来批量恢复深度归档对象。 看到这个

如何将文件夹（或整个存储桶）从 Glacier 恢复到 Amazon S3？

问题描述

12 个解决方案

解决方案1
72 2015-01-16 18:28:13

解决方案2
44 2014-02-11 19:33:43

解决方案3
23 2017-10-13 14:03:08

解决方案4
15 已采纳 2013-11-17 19:48:07

解决方案5
5 2014-05-30 02:50:15

解决方案6
4 2017-05-23 13:20:21

解决方案7
3 2013-11-17 20:18:33

解决方案8
2 2015-03-24 23:52:33

解决方案9
1 2019-07-04 23:08:06

解决方案10
1 2020-05-02 10:40:20

解决方案11
0 2017-04-19 14:20:15

解决方案12
0 2021-06-25 16:51:30

解决方案13
0 2023-01-06 11:54:38

如何将文件夹（或整个存储桶）从 Glacier 恢复到 Amazon S3？

问题描述

12 个解决方案

解决方案1 72 2015-01-16 18:28:13

解决方案2 44 2014-02-11 19:33:43

解决方案3 23 2017-10-13 14:03:08

解决方案4 15 已采纳 2013-11-17 19:48:07

解决方案5 5 2014-05-30 02:50:15

解决方案6 4 2017-05-23 13:20:21

解决方案7 3 2013-11-17 20:18:33

解决方案8 2 2015-03-24 23:52:33

解决方案9 1 2019-07-04 23:08:06

解决方案10 1 2020-05-02 10:40:20

解决方案11 0 2017-04-19 14:20:15

解决方案12 0 2021-06-25 16:51:30

解决方案13 0 2023-01-06 11:54:38

解决方案1
72 2015-01-16 18:28:13

解决方案2
44 2014-02-11 19:33:43

解决方案3
23 2017-10-13 14:03:08

解决方案4
15 已采纳 2013-11-17 19:48:07

解决方案5
5 2014-05-30 02:50:15

解决方案6
4 2017-05-23 13:20:21

解决方案7
3 2013-11-17 20:18:33

解决方案8
2 2015-03-24 23:52:33

解决方案9
1 2019-07-04 23:08:06

解决方案10
1 2020-05-02 10:40:20

解决方案11
0 2017-04-19 14:20:15

解决方案12
0 2021-06-25 16:51:30

解决方案13
0 2023-01-06 11:54:38