简体   繁体   中英

Saving partial content when WWW::Mechanize GET times out

I'm using the following Perl code to get data from https://www.otcmarkets.com/research/stock-screener/api?sortField=symbol&sortOrder=asc&page=0&pageSize=20000 :

use warnings;
use WWW::Mechanize::GZip;

my $TempFilename = "D:\\temp\\test.txt";

my $mech = WWW::Mechanize::GZip->new(
    ssl_opts => {
        verify_hostname => 0,

open(OUT, ">", $TempFilename);
binmode(OUT, ":utf8");
print OUT $mech->content;

Unfortunately the request always times out, and my temporary file always contains

read timeout at C:/Strawberry/perl/vendor/lib/Net/HTTP/Methods.pm line 268.

However, if I point a web browser to the same URL, I get a bunch of JSON data that looks like this, which is what I am seeking:

"{\"count\":17114,\"pages\":1,\"stocks\":[{\"securityId\":194057,\"reportDate\":\"Jan 26, 2022 12:00:00 AM\",\"symbol\":\"AAAIF\",\"securityName\":\"ALTERNATIVE INVESTMENT TR\",\"market\":\"Pink...

My question is whether there is any way I can modify my script so that it saves the same data that my web browser is able to display instead of the timeout message to my file.


Change the user agent , the default is a string of the form libwww-perl/#.###. But some sites are sensible to that. Also, you can use WWW::Mechanize directly and set a concrete timeout parameter (in seconds). Like this:

use strict;
use warnings;
use WWW::Mechanize;

my $TempFilename = "c:\\temp\\test.txt";
my $url = "https://www.otcmarkets.com/research/stock-screener/api?sortField=symbol&sortOrder=asc&page=0&pageSize=20000";

my $mech = WWW::Mechanize->new(
    agent    => "Mozilla/5.0",
    timeout  => 15,
    # ssl_opts => { verify_hostname => 0 },

open(OUT, ">", $TempFilename);
binmode(OUT, ":utf8");
print OUT $mech->content;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM