简体   繁体   English

将大对象上传到Cloudfiles返回不同的md5

[英]uploading large object to Cloudfiles returns different md5

So I have this code and I'm trying to upload large files as per https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/Storage/Object.md to Rackspace: 所以我有这段代码,我试图按照https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/Storage/Object.md将大文件上传到Rackspace:

$src_path = 'pathtofile.zip'; //about 700MB
$md5_checksum = md5_file($src_path); //result is f210775ccff9b0e4f686ea49ac4932c2
$trans_opts = array(
      'name' => $md5_checksum,
      'concurrency' => 6,
      'partSize'    => 25000000
 );
$trans_opts['path'] = $src_path;
$transfer = $container->setupObjectTransfer($trans_opts);
$response = $transfer->upload();

Which allegedly uploads the file just fine 据说上传文件就好了

However when I try to download the file as recommended here https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/USERGUIDE.md : 但是,当我尝试按照此处的建议下载文件时, 请https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/USERGUIDE.md

$name = 'f210775ccff9b0e4f686ea49ac4932c2';
$object = $container->getObject($name);
$objectContent = $object->getContent();
$pathtofile = 'destinationpathforfile.zip';
$objectContent->rewind();
$stream = $objectContent->getStream();
file_put_contents($pathtofile, $stream);
$md5 = md5_file($pathtofile);

The result of md5_file ends up being different from 'f210775ccff9b0e4f686ea49ac4932c2'....moreover the downloaded zip ends up being unopenable/corrupted md5_file的结果最终不同于“ f210775ccff9b0e4f686ea49ac4932c2”。...此外,下载的zip最终无法打开/损坏。

What did I do wrong? 我做错了什么?

It's recommended that you only use multipart uploads for files over 5GB . 建议您仅对5GB以上的文件使用分段上传 For files under this threshold, you can use the normal uploadObject method. 对于低于此阈值的文件,可以使用常规的uploadObject方法。

When you use the transfer builder, it segments your large file into smaller segments (you provide the part size) and concurrently uploads each one. 使用传输构建器时,它将大文件分成较小的段(您提供零件大小),并同时上传每个文件。 When this process has finished, a manifest file is created which contains a list of all these segments. 此过程完成后,将创建一个清单文件,其中包含所有这些段的列表。 When you download the manifest file, it collates them all together, effectively pretending to be the big file itself. 当您下载清单文件时,它将所有文件整理在一起,从而有效地伪装成大文件本身。 But it's just really an organizer. 但这真的是一个组织者。

To get back to answering your question, the ETag header of a manifest file is not calculated how you may think. 为了回答您的问题,清单文件的ETag标头没有按照您的想法进行计算。 What you're currently doing is taking the MD5 checksum of the entire 700MB file, and comparing it against the MD5 checksum of the manifest file. 您当前正在做的是获取整个700MB文件的MD5校验和,并将其与清单文件的MD5校验和进行比较。 But these aren't comparable. 但是这些没有可比性。 To quote the documentation : 引用文档

the ETag header is calculated by taking the ETag value of each segment, concatenating them together, and then returning the MD5 checksum of the result. 通过获取每个段的ETag值,将它们连接在一起,然后返回结果的MD5校验和,来计算ETag标头。

There are also downsides to using this DLO operation that you need to be aware of: 使用此DLO操作还有一些缺点,您需要注意:

End-to-end integrity is not assured. 不能保证端到端的完整性。 The eventual consistency model means that although you have uploaded a segment object, it might not appear in the container list immediately . 最终的一致性模型意味着,尽管您已上传了细分对象, 但它可能不会立即出现在容器列表中 If you download the manifest before the object appears in the container, the object will not be part of the content returned in response to a GET request. 如果在对象出现在容器中之前下载清单,则该对象将不属于响应GET请求而返回的内容。

If you think there's been an error in transmission, perhaps it's because a HTTP request failed along the way. 如果您认为传输中存在错误,则可能是因为HTTP请求一路失败。 You can use retry strategies (using the backoff plugin) to retry failed requests. 您可以使用重试策略 (使用退避插件)重试失败的请求。

You can also turn on HTTP logging to check every network transaction to help with debugging. 您还可以打开HTTP日志记录以检查每个网络事务,以帮助调试。 Be careful, though, using the above with echo out the HTTP request body (>25MB) into STDOUT. 但是,请小心使用以上内容,并将HTTP请求正文(> 25MB)回显到STDOUT中。 You might want to use this instead: 您可能要改用此方法:

use Guzzle\Plugin\Log\LogPlugin;
use Guzzle\Log\ClosureLogAdapter;

$stream = fopen('php://output', 'w');

$logSubscriber = new LogPlugin(new ClosureLogAdapter(function($m) use ($stream) {
    fwrite($stream, $m . PHP_EOL);
}), "# Request:\n{url} {method}\n\n# Response:\n{code} {phrase}\n\n# Connect time: {connect_time}\n\n# Total time: {total_time}", false);

$client->addSubscriber($logSubscriber);

As you can see, you're using a template to dictate what's outputted. 如您所见,您正在使用模板来指示输出的内容。 There's a full list of template variables here . 有模板变量的完整列表在这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM