简体   繁体   中英

How can I login and download a file with Perl's WWW::Mechanize?

I'm trying to use Perl's WWW::Mechanize to download a file. I have to login the website before and then, after having validated the form, download the file.

The thing is, after hours, I didn't succeed doing what I want. At the end, the script save a file which is not a zip file but a html file with nothing interesting in it.

Here is the script I've done :

use WWW::Mechanize;
use Crypt::SSLeay;

my $login = "MyMail";
my $password = "MyLogin";
my $url = 'http://www.lemonde.fr/journalelectronique/donnees/protege/20101002/Le_Monde_20101002.zip';

$bot = WWW::Mechanize->new();
$bot->cookie_jar(
    HTTP::Cookies->new(
        file           => "cookies.txt",
        autosave       => 1,
        ignore_discard => 1,
    )
);

$response = $bot->get($url);

$bot->form_name("formulaire");
$bot->field('login', $login);
$bot->field('password', $password);
$bot->submit();

$response = $bot->get($url);
my $filename = $response->filename;

if (! open ( FOUT, ">$filename" ) ) {
    die("Could not create file: $!" );
}
print( FOUT $bot->response->content() );
close( FOUT );

Could you help me finding what mistakes I've done?

There are some hidden input fields which I assume are filled in when you navigate to the download using a browser rather than using a URL directly.

In addition, they are setting some cookies via JavaScript and those would not be picked up by Mechanize. However, there is a plugin WWW::Mechanize::Plugin::JavaScript which might be able to help you with that (I have no experience with it).

Use LiveHTTPHeaders to see what gets submitted by the browser and replicate that (assuming you are not violating their TOS).

The problem you mention is well known in Mechanize. The simplest solution is to use the Raspo library.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM