使用LWP :: UserAgent下载文件的程序中的内存泄漏

Question

I am trying to revive a Perl script I was using a long time ago. 我试图恢复我很久以前使用的Perl脚本。 It is for downloading files from cloud storage to my local client. 它用于将文件从云存储下载到本地客户端。 I'm pretty sure it worked fine back then, but now I am having the issue that LWP::UserAgent downloads the file entirely into the memory before writing it to disk. 我很确定它当时工作正常，但现在我遇到的问题是LWP::UserAgent在将文件写入磁盘之前将文件完全下载到内存中。 Expected and former behaviour would be that it should write chunks of the received file to the target during download. 预期和以前的行为是它应该在下载期间将接收文件的块写入目标。

I'm am trying it currently on OSX with Perl 5.16.3 and 5.18 and also tried it on Windows but I do not know the Perl version any more. 我正在使用Perl 5.16.3和5.18在OSX上尝试它，并且还在Windows上尝试过但我不再了解Perl版本。 I am pretty confident that this is related to the Perl version, but I do not know which I used back then and I want to know what changed. 我非常有信心这与Perl版本有关，但我不知道我当时使用了哪些版本，而且我想知道改变了什么。

sub downloadFile {

    my $url           = shift;
    my $filename      = shift;
    my $temp_filename = shift;
    my $expected_size = shift;

    (   $download_size, $received_size, $avg_speed,   $avg_speed_s, $avg_speed_q,
        $speed_count,   $speed,         $byte_offset, $http_status
    ) = ( 0, 0, 0, 0, 0, 0, 0, 0, 0 );

    if ( -e $temp_filename and !$options{'no-resume'} ) {

        my @stat = stat($temp_filename);

        if ( $expected_size > $stat[7] ) {
            $byte_offset   = $stat[7];
            $received_size = $stat[7];
        }
    }

    open DOWNLOAD, ( $byte_offset > 0 ) ? ">>" : ">", $temp_filename
            or die "Unable to create download file: $!";
    binmode DOWNLOAD;

    $last_tick = time();

    my $host = "myhost";

    if ( $url =~ m/http:\/\/(.*?)\//gi ) {
        $host = $1;
    }

    $agent->credentials(
            $host . ":80",
            "Login Required",
            $config->{"account_name"},
            $config->{"account_password"} );

    my $response = $agent->get(
            $url,
            ':content_cb'     => \&didReceiveData,
            ':read_size_hint' => ( 2**14 ) );

    close DOWNLOAD;

    my @stat        = stat($temp_filename);
    my $actual_size = $stat[7];

    if ( ! $response->is_success() ) {

        printfvc( 0,
                "\rDownload failed: %s",
                'red',
                $response->status_line() );

        return 0;
    }
    elsif ( $actual_size != $expected_size ) {

        printfvc( 0,
                "\rDownloaded file does not have expected size (%s vs. %s)",
                'red',
                $actual_size, $expected_size );

        return 0;
    }
    else {

        rename $temp_filename, $filename;

        printfvc( 0,
                "\rDownload succeeded                                                           ",
                'green' );

        return 1;
    }
}

sub didReceiveData {

    my ( $data, $cb_response, $protocol ) = @_;

    #my($response, $ua, $h, $data) = @_;
    my $data_size = scalar( length($data) );
    $received_size += $data_size;
    $speed_count   += $data_size;

    my $now = time();

    if ( $last_tick < $now ) {
        $speed       = $speed_count;
        $speed_count = 0;
        $last_tick   = $now;
        $avg_speed_q++;
        $avg_speed_s += $speed;
        $avg_speed = $avg_speed_s / $avg_speed_q;
    }

    if ( $download_size > 0 and $http_status eq "200" or $http_status eq "206" ) {

        print DOWNLOAD $data;

        printf("-> %.1f %% (%s of %s, %s/s) %s      ",
                ( $received_size / $download_size ) * 100,
                fsize($received_size),
                fsize($download_size),
                fsize($speed),
                $avg_speed_q > 3
                ? fduration( ( $download_size - $received_size ) / $avg_speed ) . " remaining"
                : ""
        ) if ( $verbosity >= 0 );
    }
    else {
        printf("-> Initiating transfer...") if ( $verbosity >= 0 );
    }

    return 1;
}

output: 输出：

mun-m-sele:PutIO-Perl-folder-sync sele$ perl putiosync.pl 
Syncing folder 'Test' to '/Users/sele/Downloads/Test'...
1 files queued to download
5MB.zip
Fetching '5MB.zip' [1 of 1]

-> 0.3 % (16.0 kiB of 5.0 MiB, 16.0 kiB/s)       
-> 0.6 % (32.0 kiB of 5.0 MiB, 16.0 kiB/s)       
-> 0.9 % (48.0 kiB of 5.0 MiB, 16.0 kiB/s)       
 .
 . 
 .      
-> 99.1 % (5.0 MiB of 5.0 MiB, 16.0 kiB/s)       
-> 99.4 % (5.0 MiB of 5.0 MiB, 16.0 kiB/s)       
-> 99.7 % (5.0 MiB of 5.0 MiB, 16.0 kiB/s)       
Download succeeded

So output is as expected BUT still this output only appears after the file has been loaded into memory. 因此输出是预期的但仍然只有在文件加载到内存后才会出现此输出。

The content_cb is not called during the download (tested by simply putting a print("cb") to the top of didReceiveData 在下载期间不调用content_cb （通过简单地将print("cb")放在didReceiveData的顶部进行测试

update 更新

I found out that it works as expected on Windows Strawberry Perl 5.16.2. 我发现它在Windows Strawberry Perl 5.16.2上按预期工作。 I can provide you with package versions if you tell me which and how ;) 如果你告诉我哪些以及如何，我可以为你提供包装版本;）

Answer 1

Your own code contains a lot of irrelevances, like resume support, multiple server support, progress logging, site credentials, temporary download files, error handling, and average speed calculations. 您自己的代码包含许多不相关的内容，例如恢复支持，多服务器支持，进度日志记录，站点凭据，临时下载文件，错误处理和平均速度计算。 None of these are relevant to the core problem that you described, and that is why I asked you to create a Minimal, Complete, and Verifiable example . 这些都与您描述的核心问题无关，这就是我要求您创建一个Minimal，Complete和Verifiable示例的原因 。 I don't understand your refusal, or why you seem to be clinging to the idea that the error is in Perl and not in your own code 我不明白你的拒绝，或者为什么你似乎坚持错误是在Perl而不是你自己的代码中的想法

Without that, all I can do is demonstrate that the technique works well. 没有它，我所能做的就是证明该技术运作良好。 Here is the sort of thing that you should have generated as a demonstration of the problem. 这是你应该产生的一种事情，作为问题的演示。 It is very little different from your own code, and it works fine. 它与您自己的代码差别很小，而且运行正常。 It downloads an official ISO image of the Ubuntu desktop distribution which is about 1.4GB of information. 它下载了Ubuntu桌面发行版的官方ISO映像，大约有1.4GB的信息。 The process uses a steady 17MB of memory and finishes in 14 minutes. 该过程使用稳定的17MB内存，并在14分钟内完成。 The size of the resultant file exactly matches the Content-Length specified in the HTTP header 结果文件的大小与HTTP标头中指定的Content-Length完全匹配

Beyond this no one can help you further. 除此之外，没有人可以帮助你。 I encourage you to accept the help of experts when you have asked for it. 我鼓励您在提出要求时接受专家的帮助。 It's also worth noting that the problem will often be revealed by the process of creating an MCVE from your faulty program: you are very likely to delete a non-essential part of the code and find that the issue has disappeared 还值得注意的是，通过从错误程序创建MCVE的过程经常会发现问题：您很可能删除代码的非必要部分并发现问题已消失

use strict;
use warnings 'all';

use LWP;

use constant ISO_URL => 'http://releases.ubuntu.com/16.04/ubuntu-16.04-desktop-amd64.iso';

STDOUT->autoflush;

my $ua = LWP::UserAgent->new;

my $expected;
{
    my $res = $ua->head(ISO_URL);
    $expected = $res->header('Content-Length');
    printf "Expected file size is %.3fMB\n",  $expected / 1024**2;
}

my ($iso_file) = ISO_URL =~ m{([^/]+)\z};
open my $iso_fh, '>:raw', $iso_file or die $!;
my $total;
my $pc = 0;

{
    my $res = $ua->get(
        ISO_URL,
        ':content_cb'     => \&content_cb,
        ':read_size_hint' => 16 * 1024,
    );

    close $iso_fh or die $!;

    print $res->status_line, "\n";
    printf "Final file size is %.3fMB\n", (-s $iso_file) / 1024**2;
}

sub content_cb {

    my ( $data, $res ) = @_;

    die $res->status_line unless $res->is_success;

    print $iso_fh $data;

    $total += length $data;
    while ( $pc < 100 * $total / $expected ) {
        printf "%3d%%\n", $pc++;
    }
}

output 产量

Expected file size is 1417.047MB
  0%
  1%
  2%
  3%
  4%
  5%
  :
  :
 95%
 96%
 97%
 98%
 99%
200 OK
Final file size is 1417.047MB

Answer 2

Could it be that the problem is with the file I/O rather than LWP? 难道问题在于文件I / O而不是LWP吗？ I assume that data is not being flushed to the file till you close the file. 我假设在关闭文件之前，数据没有被刷新到文件中。

Below is an example code on how to make File handle flush data to hard disk: 下面是如何使File处理刷新数据到硬盘的示例代码：

{ my $ofh = select LOG;
  $| = 1;
  select $ofh;
}

Check out perldoc -q flush and this interesting article on buffering, " Suffering from Buffering? ". 查看perldoc -q flush和这篇关于缓冲的有趣文章，“ 遭受缓冲？ ”。

Answer 3

A major problem with your code is that $http_status is never assigned. 您的代码的一个主要问题是永远不会分配$http_status 。 It can be set only by the callback didReceiveData or after the entire download has completed when the get call exits 它只能通过回调didReceiveData设置，或者在get退出时完成整个下载后设置

But your callback tests whether $http_status eq "200" (which should be $cb_response->is_success ) before printing to the DOWNLOAD file handle, so nothing can ever be written 但是你的回调在打印到DOWNLOAD文件句柄之前测试$http_status eq "200" （应该是$cb_response->is_success ），所以什么都$cb_response->is_success来

I can believe that your code escalates memory because it endlessly prints -> Initiating transfer... to STDOUT, but nothing will ever be written to the temporary file because of the untested HTTP status. 我可以相信你的代码会升级内存，因为它无休止地打印-> Initiating transfer...到STDOUT，但由于未经测试的HTTP状态，所有内容都不会写入临时文件。 I am certain that you watched your process run and die with an Out of memory error and instantly blamed Perl without even trying to download a 1KB file. 我确信您看到您的进程运行并因内存不足而死亡，并立即指责Perl，甚至没有尝试下载1KB文件。 Your code has never worked, and your question and support of those who would help you is outrageous 你的代码从来没有用过，而那些对你有帮助的人的问题和支持是令人愤慨的

"I'm pretty sure it worked fine back then" isn't a great start, but when you then reject on that basis all applications of a solution or a request for information then you are being ridiculous “我非常确定它当时工作得很好”并不是一个好的开始，但当你在此基础上拒绝解决方案的所有应用或信息请求时你就是荒谬的

使用LWP :: UserAgent下载文件的程序中的内存泄漏

问题描述

3 个解决方案

解决方案1
2 2016-04-22 14:38:33

output 产量

解决方案2
1 2016-04-22 00:50:12

解决方案3
0 2016-04-23 12:11:32

使用LWP :: UserAgent下载文件的程序中的内存泄漏

问题描述

3 个解决方案

解决方案1 2 2016-04-22 14:38:33

output 产量

解决方案2 1 2016-04-22 00:50:12

解决方案3 0 2016-04-23 12:11:32

解决方案1
2 2016-04-22 14:38:33

解决方案2
1 2016-04-22 00:50:12

解决方案3
0 2016-04-23 12:11:32