简体   繁体   English

如何使用Perl的WWW :: Mechanize登录和下载文件?

[英]How can I login and download a file with Perl's WWW::Mechanize?

I'm trying to use Perl's WWW::Mechanize to download a file. 我正在尝试使用Perl的WWW :: Mechanize下载文件。 I have to login the website before and then, after having validated the form, download the file. 我必须先登录网站,然后在验证表格后下载文件。

The thing is, after hours, I didn't succeed doing what I want. 事情是,几个小时后,我没有成功做自己想做的事。 At the end, the script save a file which is not a zip file but a html file with nothing interesting in it. 最后,脚本会保存一个文件,该文件不是zip文件,而是一个html文件,没有任何有趣的内容。

Here is the script I've done : 这是我完成的脚本:

use WWW::Mechanize;
use Crypt::SSLeay;

my $login = "MyMail";
my $password = "MyLogin";
my $url = 'http://www.lemonde.fr/journalelectronique/donnees/protege/20101002/Le_Monde_20101002.zip';

$bot = WWW::Mechanize->new();
$bot->cookie_jar(
    HTTP::Cookies->new(
        file           => "cookies.txt",
        autosave       => 1,
        ignore_discard => 1,
    )
);

$response = $bot->get($url);

$bot->form_name("formulaire");
$bot->field('login', $login);
$bot->field('password', $password);
$bot->submit();

$response = $bot->get($url);
my $filename = $response->filename;

if (! open ( FOUT, ">$filename" ) ) {
    die("Could not create file: $!" );
}
print( FOUT $bot->response->content() );
close( FOUT );

Could you help me finding what mistakes I've done? 您能帮我发现我犯了什么错误吗?

There are some hidden input fields which I assume are filled in when you navigate to the download using a browser rather than using a URL directly. 当您使用浏览器(而不是直接使用URL)导航到下载内容时,我假设会填写一些隐藏的输入字段。

In addition, they are setting some cookies via JavaScript and those would not be picked up by Mechanize. 另外,他们通过JavaScript设置了一些cookie,而Mechanize不会将其拾取。 However, there is a plugin WWW::Mechanize::Plugin::JavaScript which might be able to help you with that (I have no experience with it). 但是,有一个插件WWW :: Mechanize :: Plugin :: JavaScript可能会帮助您(我没有经验)。

Use LiveHTTPHeaders to see what gets submitted by the browser and replicate that (assuming you are not violating their TOS). 使用LiveHTTPHeaders可以查看浏览器提交的内容并进行复制(假设您没有违反其TOS)。

The problem you mention is well known in Mechanize. 您提到的问题在Mechanize中是众所周知的。 The simplest solution is to use the Raspo library. 最简单的解决方案是使用Raspo库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用WWW :: Mechanize或任何Perl模块下载文件? - How can I download a file using WWW::Mechanize or any Perl module? 如何使用Perl的WWW :: Mechanize获取帧源? - How can I get the frame source with Perl's WWW::Mechanize? 如何使用Perl的WWW :: Mechanize从超时中恢复? - How can I recover from a timeout with Perl's WWW::Mechanize? 我可以使用WWW :: Mechanize在Perl中自动执行AngularJS登录表单吗 - Can I automate AngularJS login form in Perl using WWW::Mechanize 如何在Perl的WWW :: Mechanize中使用Web代理? - How to do I use a web proxy with Perl's WWW::Mechanize? 如何使用 Perl 的 WWW::Mechanize “深入”到一个网站 - How can I “drill down” into a website using Perl's WWW::Mechanize 如何使用Perl的WWW :: Mechanize从页面中提取所有链接(不包括一个链接)? - How can I extract all links from the page excluding one using Perl's WWW::Mechanize? 如何在Perl的WWW :: Mechanize中打印cookie_jar值? - How can I print the cookie_jar values in Perl's WWW::Mechanize? 如何使用Perl的WWW :: Mechanize访问没有名称或ID的表单? - How can I access forms without a name or id with Perl's WWW::Mechanize? 使用WWW:机械化以perl脚本登录和下载文件,但无法进入实际的页面内容 - Used WWW:Mechanize to login and download a file in perl script but cannot get into the actual page content
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM