简体   繁体   中英

How to make AWS S3 Glacier files available for retrieval recursively with AWS CLI

How can I make files stored at AWS S3 Glacier available for retrieval recursively from CLI?

I run the following command:

aws s3 cp "s3://mybucket/remotepath/" localpath --recursive

and got the following line for each of the files:

warning: Skipping file s3://mybucket/remotepath/subdir/filename.xml. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

However, the aws s3api restore-object has a --key parameter which specifies a single file without an ability to recursively traverse through directories.

How can I recursively restore files for retrieval from AWS CLI?

The Perl script to restore the files

You can use the following Perl script to start the restore process of the files recursively and monitor the process. After the restore is completed, you can copy the files during the specified number of days.

#!/usr/bin/perl

use strict;
my $bucket = "yourbucket";
my $path = "yourdir/yoursubdir/";
my $days = 5; # the number of days you want the restored file to be accessible for
my $retrievaloption = "Bulk"; # retrieval option: Bulk, Standard, or Expedited
my $checkstatus = 0;
my $dryrun = 0;

my $cmd = "aws s3 ls s3://$bucket/$path --recursive";
print "$cmd\n";
my @lines = `$cmd`;
my @cmds;
foreach (@lines) {
  my $pos = index($_, $path);
  if ($pos > 0) {
    my $s = substr($_, $pos);
    chomp $s;
    if ($checkstatus)
    {
      $cmd = "aws s3api head-object --bucket $bucket --key \"$s\"";
    } else {
      $cmd = "aws s3api restore-object --bucket $bucket --key \"$s\" --restore-request Days=$days,GlacierJobParameters={\"Tier\"=\"$retrievaloption\"}";
    }
    push @cmds, $cmd;
  } else {
    die $_;
  }
} 
undef @lines;
foreach (@cmds)
{
  print "$_\n";
  unless ($dryrun) {print `$_`; print"\n";}
}

Before running the script, modify the $bucket and $path value. Run the script then and watch the output.

You can first run it in a "dry run" mode that will only print the AWS CLI commands to the screen without actually restoring the file. To do that, modify the $dryrun value to 1 . You can redirect the output of the dry run to a batch file and execute it separately.

Monitor the restoration status

After you run the script and started the restore process, it will take from a few minutes to a few hours for the files to get available for copying.

You will be only able to copy the files after the restore process completes for each of the files.

To monitor the status, modify the $checkstatus value to 1 and run the script again. While the restoration is still in process, you will see the output, for each of the files, similar to the following:

{
    "AcceptRanges": "bytes",
    "Restore": "ongoing-request=\"true\"",
    "LastModified": "2022-03-07T11:13:53+00:00",
    "ContentLength": 1219493888,
    "ETag": "\"ad02c999d7fe6f1fb5ddb0734017d3b0-146\"",
    "ContentType": "binary/octet-stream",
    "Metadata": {},
    "StorageClass": "GLACIER"
}

When the files will finally became available for retrieval, the "Restore" line will look like the following:

"Restore": "ongoing-request=\"false\", expiry-date=\"Wed, 20 Apr 2022 00:00:00 GMT\"",

After that, you will be able to copy the files from AWS S3 to your local disk, eg

aws s3 cp "s3://yourbucket/yourdir/yoursubdir/" yourlocaldir --recursive --force-glacier-transfer

Restore options

Depending on the retrieval option you selected in the script for your files stored in the Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier) archive tier, "Expedited" retrievals complete recovery in 1-5 minutes, "Standard" — in 3-5 hours, and "Bulk" — in 5-12 hours. The "Bulk" retrieval option is the cheapest if not free (it depends on the Glacier tier which you chosen to keep your files at). "Expedited" is the most expensive retrival option and may not be available for retrivals from Amazon S3 Glacier Deep Archive storage tier, for which restoration may take up to 48 hours.

Improve the script to accept command-line parameters

By the way, you can modify the script to accept the bucket name and the directory name from the command line. In this case, replace the following two lines:

my $bucket = "yourbucket";
my $path = "yourdir/yoursubdir/";

to the following lines:

my $numargs = $#ARGV + 1;  
unless ($numargs == 2) {die "Usage: perl restore-aws.pl bucket path/\n";}
my $bucket=$ARGV[0];  
my $path=$ARGV[1];  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM