简体   繁体   English

如何检查使用getstore()函数下载的文件是否未完成/损坏?

[英]How to check if downloaded file using getstore() function is not complete/corrupted?

I wrote a quick script to download files using LWP::Simple library and its getstore() function. 我写了一个快速脚本来使用LWP :: Simple库及其getstore()函数下载文件。 It is working rather well, but occasionally downloaded file is not complete. 它工作得很好,但偶尔下载的文件不完整。 I do not know what is causing this, but when I download it afterward manually using wget in command line file is OK. 我不知道是什么导致了这个,但是当我在命令行文件中手动使用wget后手动下载它是可以的。

I would guess corrupted files are caused by connection drop or something similar, although I run my script on dedicated line in datacenter connection might drop somewhere between my server and remote server. 我猜测损坏的文件是由连接丢弃或类似的东西引起的,虽然我在数据中心连接的专用线路上运行我的脚本可能会丢弃在我的服务器和远程服务器之间。

This is my code: 这是我的代码:

sub download {
my $status = getstore($_[0], $_[1]);
if (is_success($status)) { return 1; } else { return 0; }
}

What are the possible solutions for this problem? 这个问题的可能解决方案是什么? How to check if transfer went alright and if file is complete and not corrupted? 如何检查传输是否正常以及文件是否完整且未损坏?

Thank you for your valuable replies. 感谢您的宝贵回复。

We can do like so: 我们可以这样做:

use LWP;
use HTTP::Request::Common;
my $ua = LWP::UserAgent->new;
$ua->timeout(3);
my $res = $ua->request(HEAD $url); # just to get headers of a file
my $length_full = $res->headers->{'content-length'};
...
$res = $request(GET $url);
my $length_got = $res->content_length;
if ($length_got != $length_full) { print "File have not been downloaded completely!\n";
...

The is_success() sub returns true for any 2XX HTTP code, so if you are for example getting "206 Partial Content", that will count as success. 对于任何2XX HTTP代码,is_success()子函数都返回true,因此如果您获得“206 Partial Content”,那么这将被视为成功。

You can just check whether status is 200 or not, and act accordingly. 您可以检查状态是否为200,并采取相应措施。

The $status values you can get are listed in the LWP::Simple documentation . 您可以获得的$status值列在LWP :: Simple文档中 If the servers return an error status every time you get a partial or corrupted download, just checking the return value would be enough. 如果每次下载部分或损坏时服务器都返回错误状态,只需检查返回值就足够了。

Otherwise, you would need a more sophisticated strategy. 否则,您需要更复杂的策略。 If there are MD5 or SHA checksums for the files, you can check those after download. 如果文件有MD5或SHA校验和,则可以在下载后检查这些校验和。 If not, you need to inspect the headers, find out how much the server was planning to send and how much you received. 如果没有,您需要检查标题,找出服务器计划发送的数量以及收到的数量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM