簡體   English   中英

處理WWW :: Mechanize中的GET錯誤

[英]Handling GET errors in WWW::Mechanize

我正在使用一個腳本,該腳本使用WWW :: Mechanize從網站上抓取數據,並且除了網站本身之外,其他所有功能都運行良好。 有時它只是暫時沒有響應,並且對於給定的my $url = 'http://www.somesite.com/more/url/text'我在$mech->get($url)

Error GETing http://www.somesite.com/more/url/text: Can't connect to www.somesite.com:443 at ./trackSomesite.pl line 34.

此錯誤有時會以無法識別的模式偶爾發生,並且根據我與我正在處理的網站的經驗,這是由於服務器不穩定。

我希望能夠明確知道發生此錯誤,而不是其他錯誤(例如Too many requests 我的問題是如何使我的腳本處理該錯誤而不死?

$mech->get(...)請求包裝在一個eval塊中,或者使用autocheck => 0 ,然后檢查$mech->status代碼和/或$mech->status_line來決定要做什么。

這是一個例子:

#!/usr/bin/env perl

use WWW::Mechanize;

use constant RETRY_MAX => 5;

my $url = 'http://www.xxsomesite.com/more/url/text'; # Cannot connect

my $mech = WWW::Mechanize->new( autocheck => 0 );

my $content = fetch($url);

sub fetch {
    my ($url) = @_;

    for my $retry (0 .. RETRY_MAX-1) {
        my $message = "Attempting to fetch [ $url ]";
        $message .= $retry ? " - retry $retry\n" : "\n";
        warn $message;

        my $response = $mech->get($url);
        return $response->content() if $response->is_success();

        my $status = $response->status;
        warn "status = $status\n";

        if ($response->status_line =~ /Can['']t connect/) {
            $retry++;
            warn "cannot connect...will retry after $retry seconds\n";
            sleep $retry;
        } elsif ($status == 429) {
            warn "too many requests...ignoring\n";
            return undef;
        } else {
            warn "something else...\n";
            return undef;
        }
    }

    warn "giving up...\n";
    return undef;
}

產量

Attempting to fetch [ http://www.xxsomesite.com/more/url/text ]
status = 500
cannot connect...will retry after 1 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 1
status = 500
cannot connect...will retry after 2 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 2
status = 500
cannot connect...will retry after 3 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 3
status = 500
cannot connect...will retry after 4 seconds
Attempting to fetch [ http://www.xxsomesite.com/more/url/text ] - retry 4
status = 500
cannot connect...will retry after 5 seconds
giving up...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM