简体   繁体   English

如何使用 PHP 和 aws-sdk v3 将大型档案上传到 Amazon Glacier?

[英]How to upload large archives to Amazon Glacier using PHP and aws-sdk v3?

This is my first time working with anything from Amazon.这是我第一次使用亚马逊的任何东西。 I am trying to upload multiple files into Amazon Glacier using the PHP SDK V3.我正在尝试使用 PHP SDK V3 将多个文件上传到 Amazon Glacier。 The files will then need to be merged by Amazon into one.然后,亚马逊需要将这些文件合并为一个文件。

The files are stored in the home directory of cPanel and will have to be uploaded via a cron job to Amazon Glacier.这些文件存储在 cPanel 的主目录中,必须通过 cron 作业上传到 Amazon Glacier。

I know I have to use the upload multi part method but I am not really sure which other functions it requires to make it work.我知道我必须使用上传分段方法,但我不确定它需要哪些其他功能才能使其工作。 I am also not sure if the way I calculated and passed the variables is correct.我也不确定我计算和传递变量的方式是否正确。

This is the code I got so far:这是我到目前为止得到的代码:

<?php
require 'aws-autoloader.php';

use Aws\Glacier\GlacierClient;
use Aws\Glacier\TreeHash;

//############################################
//DEFAULT VARIABLES
//############################################
$key = 'XXXXXXXXXXXXXXXXXXXX';
$secret = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';   
$accountId = '123456789123';
$vaultName = 'VaultName';
$partSize = '4194304';
$fileLocation = 'path/to/files/';

//############################################
//DECLARE THE AMAZON CLIENT
//############################################
$client = new GlacierClient([
    'region' => 'us-west-2',
    'version' => '2012-06-01',
    'credentials' => array(
        'key'    => $key,
        'secret' => $secret,
  )
]);

//############################################
//GET THE UPLOAD ID
//############################################
$result = $client->initiateMultipartUpload([
    'partSize' => $partSize,
    'vaultName' => $vaultName
]);
$uploadId = $result['uploadId'];

//############################################
//GET ALL FILES INTO AN ARRAY
//############################################
$files = scandir($fileLocation);
unset($files[0]);
unset($files[1]);
sort($files);

//############################################
//GET SHA256 TREE HASH (CHECKSUM)
//############################################
$th = new TreeHash();
//GET TOTAL FILE SIZE
foreach($files as $part){
    $filesize = filesize($fileLocation.$part);
    $total = $filesize;
    $th = $th->update(file_get_contents($fileLocation.$part));
}
$totalchecksum = $th->complete();

//############################################
//UPLOAD FILES
//############################################
foreach ($files as $key => $part) {
    //HASH CONTENT
    $filesize = filesize($fileLocation.$part);
    $rangeSize = $filesize-1;
    $range = 'bytes 0-'.$rangeSize.'/*';
    $sourcefile = $fileLocation.$part;

    $result = $client->uploadMultipartPart([
        'accountId' => $accountId,
        'checksum' => '',
        'range' => $range,
        'sourceFile' => $sourcefile,
        'uploadId' => $uploadId,
        'vaultName' => $vaultName
    ]);
}

//############################################
//COMPLETE MULTIPART UPLOAD
//############################################
$result = $client->completeMultipartUpload([
    'accountId' => $accountId,
    'archiveSize' => $total,
    'checksum' => $totalchecksum,
    'uploadId' => $uploadId,
    'vaultName' => $vaultName,
]);
?>

It seems that the declaring of a new Glacier client is working and I do receive an UploadID, but with the rest I am not 100% if I am doing it right.似乎声明一个新的 Glacier 客户端正在起作用,我确实收到了一个 UploadID,但如果我做得对,其余的我不是 100%。 The Amazon Glacier Vault where the files need to upload to and then get merged, remains empty and I am not sure if the files will only show ones the completeMultipartUpload has successfully been executed.文件需要上传到然后合并的 Amazon Glacier Vault 仍然是空的,我不确定这些文件是否只显示已成功执行 completeMultipartUpload 的文件。

I also receive the following error when running the code:运行代码时,我也收到以下错误:

Fatal error: Uncaught exception 'Aws\\Glacier\\Exception\\GlacierException' with message 'Error executing "CompleteMultipartUpload" on " https://glacier.us-west-2.amazonaws.com/XXXXXXXXXXXX/vaults/XXXXXXXXXX/multipart-uploads/cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm6m9PUEAq4M0x6duXm5MD8abn-M ";致命错误:未捕获的异常“Aws\\Glacier\\Exception\\GlacierException”,消息为“在https://glacier.us-west-2.amazonaws.com/XXXXXXXXXXXX/vaults/XXXXXXXXXXXX/multipart-uploads/上执行“CompleteMultipartUpload”时出错cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm6m9PUEAq4M0x6duXm5MD8abn-M "; AWS HTTP error: Client error: 403 InvalidSignatureException (client): The request signature we calculated does not match the signature you provided. AWS HTTP 错误:客户端错误:403 InvalidSignatureException(客户端):我们计算的请求签名与您提供的签名不匹配。 Check your AWS Secret Access Key and signing method.检查您的 AWS 秘密访问密钥和签名方法。 Consult the service documentation for details.有关详细信息,请参阅服务文档。 The Canonical String for this request should have been 'POST /XXXXXXXXXXX/vaults/XXXXXXXXX/multipart-uploads/cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm6m9PUEAq4M0x6duXm5MD8abn-M host:glacier.us-west-2.amazonaws.com x-amz-archive-size:1501297 x-amz-date:20151016T081455Z x-amz-glacier-version:2012-06-01 x-amz-sha256-tree-hash:?[ qiuã°²åÁ¹ý+¤Üª?¤?此请求的规范字符串应为 'POST /XXXXXXXXXXX/vaults/XXXXXXXXX/multipart-uploads/cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm45m9x4kfyGm45m9x4kfyGm45m9x4kfyGm45m9204kfyGm45m9200000000000 1501297 x-amz-date:20151016T081455Z x-amz-glacier-version:2012-06-01 x-amz-sha256-tree-hash:?[ qiuã°²åÁ¹ý+¤Üª?¤? [;K×T host;x-amz-archive-size;x-amz-date;x-amz-glacier-version;x-am in /home/XXXXXXXXXXXX/public_html/XXXXXXXXXXX/Aws/WrappedHttpHandler.php on line 152 [;K×T 主机;x-amz-archive-size;x-amz-date;x-amz-glacier-version;x-am in /home/XXXXXXXXXXXX/public_html/XXXXXXXXXXXX/Aws/WrappedHttpHandler.php 在线 152

Is there maybe a simpler way to do this?有没有更简单的方法来做到这一点? I do have full SSH access as well if that helps.如果有帮助,我也有完整的 SSH 访问权限。

I have managed this in PHP SDK V3 (Version 3) and I kept finding this question in my research, so I thought I'd post my solution too.我已经在 PHP SDK V3(第 3 版)中管理了这个问题,并且我一直在研究中发现这个问题,所以我想我也会发布我的解决方案。 Use at your own risk and there is very little error checking or handling.使用风险自负,几乎没有错误检查或处理。

<?php
require 'vendor/autoload.php';

use Aws\Glacier\GlacierClient;
use Aws\Glacier\TreeHash;


// Create the glacier client to connect with
$glacier = new GlacierClient(array(
      'profile' => 'default',
      'region' => 'us-east-1',
      'version' => '2012-06-01'
      ));

$fileName = '17mb_test_file';         // this is the file to upload
$chunkSize = 1024 * 1024 * pow(2,2);  // 1 MB times a power of 2
$fileSize = filesize($fileName);      // we will need the file size (in bytes)

// initiate the multipart upload
// it is dangerous to send the filename without escaping it first
$result = $glacier->initiateMultipartUpload(array(
      'archiveDescription' => 'A multipart-upload for file: '.$fileName,
      'partSize' => $chunkSize,
      'vaultName' => 'MyVault'
      ));

// we need the upload ID when uploading the parts
$uploadId = $result['uploadId'];

// we need to generate the SHA256 tree hash
// open the file so we can get a hash from its contents
$fp = fopen($fileName, 'r');
// This class can generate the hash
$th = new TreeHash();
// feed in all of the data
$th->update(fread($fp, $fileSize));
// generate the hash (this comes out as binary data)...
$hash = $th->complete();
// but the API needs hex (thanks). PHP to the rescue!
$hash = bin2hex($hash);

// reset the file position indicator
fseek($fp, 0);

// the part counter
$partNumber = 0;

print("Uploading: '".$fileName
    ."' (".$fileSize." bytes) in "
    .(ceil($fileSize/$chunkSize))." parts...\n");
while ($partNumber * $chunkSize < ($fileSize + 1))
{
  // while we haven't written everything out yet
  // figure out the offset for the first and last byte of this chunk
  $firstByte = $partNumber * $chunkSize;
  // the last byte for this piece is either the last byte in this chunk, or
  // the end of the file, whichever is less
  // (watch for those Obi-Wan errors)
  $lastByte = min((($partNumber + 1) * $chunkSize) - 1, $fileSize - 1);

  // upload the next piece
  $result = $glacier->uploadMultipartPart(array(
        'body' => fread($fp, $chunkSize),  // read the next chunk
        'uploadId' => $uploadId,          // the multipart upload this is for
        'vaultName' => 'MyVault',
        'range' => 'bytes '.$firstByte.'-'.$lastByte.'/*' // weird string
        ));

  // this is where one would check the results for error.
  // This is left as an exercise for the reader ;)

  // onto the next piece
  $partNumber++;
  print("\tpart ".$partNumber." uploaded...\n");
}
print("...done\n");

// and now we can close off this upload
$result = $glacier->completeMultipartUpload(array(
  'archiveSize' => $fileSize,         // the total file size
  'uploadId' => $uploadId,            // the upload id
  'vaultName' => 'MyVault',
  'checksum' => $hash                 // here is where we need the tree hash
));

// this is where one would check the results for error.
// This is left as an exercise for the reader ;)


// get the archive id.
// You will need this to refer to this upload in the future.
$archiveId = $result->get('archiveId');

print("The archive Id is: ".$archiveId."\n");


?>

I think you misunderstood uploadMultipartPart.我想你误解了uploadMultipartPart。 uploadMultipartPart means, you upload 1 big file, in multiple parts. uploadMultipartPart 意味着,您上传 1 个大文件,分多个部分。 and then do a completeMultipartUpload to mark that you have completed uploading one file.然后执行 completeMultipartUpload 标记您已完成上传一个文件。

from your code it looks like you are uploading multiple files.从您的代码看来,您正在上传多个文件。

it is possible that you do not actually need to use uploadMultipartPart您可能实际上并不需要使用 uploadMultipartPart

Maybe a you could use a regular "uploadArchive"?也许您可以使用常规的“uploadArchive”?

ref:参考:

https://blogs.aws.amazon.com/php/post/Tx7PFHT4OJRJ42/Uploading-Archives-to-Amazon-Glacier-from-PHP https://blogs.aws.amazon.com/php/post/Tx7PFHT4OJRJ42/Uploading-Archives-to-Amazon-Glacier-from-PHP

Note: Solution for uploading multiparts using aws-sdk-php v2.注意:使用 aws-sdk-php v2 上传 multipart 的解决方案。 I think it could work on v3 with few changes on the use of class TreeHash .我认为它可以在 v3 上运行,但对TreeHash类的使用几乎没有变化

Thanks to the snippet of Neil Vandermeiden , I've accomplished the same task but adding a little improvement.感谢Neil Vandermeiden片段,我已经完成了同样的任务,但增加了一些改进。

Neil only does a checksum validation for the whole file. Neil 只对整个文件进行校验和验证。 It has two possible problems:它有两个可能的问题:

  • It could be memory consuming : remember we're uploading a large file;这可能会消耗内存:记住我们正在上传一个大文件; hashing it to get the checksum, requires to open it and read all of its contents.散列它以获得校验和,需要打开它并读取其所有内容。
  • We're uploading multiple file parts : we can have problems uploading some parts, ending with corrupted file parts on aws.我们正在上传多个文件部分:我们可能会在上传某些部分时遇到问题,以 aws 上的文件部分损坏结束。 If we compute and validate every checksum of every part, we can prevent problems.如果我们计算并验证每个部分的每个校验和,我们就可以防止出现问题。

In the following code we compute the checksum of every file part sent to aws, and we send each of them with the associated file part to the aws api.在下面的代码中,我们计算发送到 aws 的每个文件部分的校验和,并将每个文件部分与关联的文件部分一起发送到 aws api。

Once aws finishes receiving the uploaded part, it performs a checksum of it.一旦 aws 完成接收上传的部分,它就会对其执行校验和。 If the checksum doesn't match ours, it throws an exception.如果校验和与我们的不匹配,则会抛出异常。 If it successes, we're sure the part has uploaded successfully.如果成功,我们确定该部分已成功上传。

<?php
use Aws\Common\Hash\TreeHash;
use Aws\Glacier\GlacierClient;

/**
 * upload a file and store it into aws glacier
 */
class UploadMultipartFileToGlacier
{
    // aws glacier
    private $description;
    private $glacierClient;
    private $glacierConfig;
    /*
     * it's a requirement the part size beingto be (1024 KB * 1024 KB) multiplied by any power of 2 (1MB, 2MB, 4MB, 8MB, and so on)
     * reference: https://docs.aws.amazon.com/aws-sdk-php/v2/api/class-Aws.Glacier.GlacierClient.html#_initiateMultipartUpload
     **/
    private $partSize;

    // file location
    private $filePath;

    private $errorMessage;
    private $executionDate;

    public function __construct($filePath)
    {
        $this->executionDate = date('Y-m-d H:i:s');
        $this->filePath = $filePath;
    
        // AWS Glacier
        $this->glacierConfig = (object) [
            'vaultId' => 'VAULT_NAME',
            'region' => 'REGION',
            'accessKeyId' => 'ACCESS_KEY',
            'secretAccessKey' => 'SECRET_KEY',
        ];

        $this->glacierClient = GlacierClient::factory(array(
            'credentials' => array(
                'key'    => $this->glacierConfig->accessKeyId,
                'secret' => $this->glacierConfig->secretAccessKey,
            ),
            'region' => $this->glacierConfig->region
        ));

        $this->description = sprintf('Upload file %s at %s', $this->filePath, $this->executionDate);

        $this->partSize = 1024 * 1024 * pow(2, 2); // 4 MB
    }

    public function upload()
    {
        list($success, $data) = $this->uploadFileToGlacier();

        if ($success) {
            // todo: tasks to do when file has upload successfuly to aws glacier
        } else {
            // todo: handle error
            // $this->errorMessage contains the exception message
        }
    }

    private function completeMultipartUpload($uploadId, $fileSize, $checksumParts)
    {
        // with all the chechsums of the processed file parts, we can compute the file checksum. It's important to send it as a parameter to the
        // aws api's GlacierClient::completeMultipartUpload. Aws compute on their side the checksum of the uploaded part. If
        // their checksum doesn't match ours, the api throws an exception.
        $checksum = $this->getChecksumFile($checksumParts);

        return $this->glacierClient->completeMultipartUpload([
            'archiveSize' => $fileSize,
            'uploadId' => $uploadId,
            'vaultName' => $this->glacierConfig->vaultId,
            'checksum' => $checksum
        ]);
    }

    private function getChecksumPart($content)
    {
        $treeHash = new TreeHash();
        $mb = 1024 * 1024 * pow(2, 0); // 1 MB (the class TreeHash only allows to process chunks <= 1 MB)
        $buffer = $content;

        while (strlen($buffer) >= $mb) {
            $data = substr($buffer, 0, $mb);
            $buffer = substr($buffer, $mb) ?: '';
            $treeHash->addData($data);
        }
        
        if (strlen($buffer)) {
            $treeHash->addData($buffer);
        }

        return $treeHash->getHash();
    }

    private function getChecksumFile($checksumParts)
    {
        $treeHash = TreeHash::fromChecksums($checksumParts);

        return $treeHash->getHash();
    }

    private function initiateMultipartUpload()
    {
        $result = $this->glacierClient->initiateMultipartUpload([
            'accountId' => '-',
            'vaultName' => $this->glacierConfig->vaultId,
            'archiveDescription' => $this->description,
            'partSize' => $this->partSize,
        ]);

        return $result->get('uploadId');
    }

    private function uploadFileToGlacier()
    {
        $success = true;
        $data = false;

        try {
            $fileSize = filesize($this->filePath);

            $uploadId = $this->initiateMultipartUpload();
            $checksums = $this->uploadMultipartFile($uploadId, $fileSize);
            $model = $this->completeMultipartUpload($uploadId, $fileSize, $checksums);

            $data = (object) [
                'archiveId' => $model->get('archiveId'),
                'executionDate' => $this->executionDate,
                'location' => $model->get('location'),
            ];
        } catch (\Exception $e) {
            $this->errorMessage = $e->getMessage();
            $success = false;
        }

        return [$success, $data];
    }
    
    private function uploadMultipartFile($uploadId, $fileSize)
    {
        $numParts = ceil($fileSize / $this->partSize);
        $fp = fopen($this->filePath, 'r');
        $partIdx = 0;
        $checksumParts = [];

        error_log("Uploading: {$this->filePath} ({$fileSize} bytes) in {$numParts} parts...");

        while ($partIdx * $this->partSize < ($fileSize + 1)) {
            $firstByte = $partIdx * $this->partSize;
            $lastByte = min((($partIdx + 1) * $this->partSize) - 1, $fileSize - 1);
            $content = fread($fp, $this->partSize);
            
            // we compute the checksum of the part we're processing. It's important to send it as a parameter to the
            // aws api's GlacierClient::uploadMultipartPart. Aws compute on their side the checksum of the uploaded part. If
            // their checksum doesn't match ours, the api throws an exception.
            $checksumPart = $this->getChecksumPart($content);

            $result = $this->glacierClient->uploadMultipartPart([
                'body' => $content,
                'uploadId' => $uploadId,
                'vaultName' => $this->glacierConfig->vaultId,
                'checksum' => $checksumPart,
                'range' => "bytes {$firstByte}-{$lastByte}/*"
            ]);

            $checksumParts[] = $result->get('checksum'); // same result as $checksumPart. It throws an exception if doesn't
            
            $partIdx++;
            error_log("Part {$partIdx} uploaded...");
        }

        return $checksumParts;
    }
}

$uploadMultipartFileToGlacier = new UploadMultipartFileToGlacier('<FILE_PATH>');

$uploadMultipartFileToGlacier->upload();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM