简体   繁体   English

使用WWW :: Mechanize :: Firefox下载网页

[英]Download web page using WWW::Mechanize::Firefox

I'm trying to scrape a website using WWW::Mechanize::Firefox , but whenever I try to get the data it is displaying JavaScript code and the data that I need is not there. 我正在尝试使用WWW::Mechanize::Firefox抓取一个网站,但是每当我尝试获取数据时,它就会显示JavaScript代码,而我所需的数据不在那儿。 If I inspect the element on Mozilla, the data that I need is there. 如果我检查Mozilla上的元素,则需要的数据就在那里。

Here's my current code: 这是我当前的代码:

#!/usr/bin/perl

use 5.010;
use strict;
use warnings;

use WWW::Mechanize::Firefox;

my $mech = WWW::Mechanize::Firefox->new();

$mech->get('link_goes_here');
$mech->allow( javascript => 0 );
$mech->content_encoding();
$mech->save_content('source.html');

Ok. 好。 So you have a page that builds its content using Javascript. 因此,您拥有一个使用Javascript构建其内容的页面。 Presumably, you have chosen to use WWW::Mechanize::Firefox instead of WWW::Mechanize because it includes support for rendering pages that are built using Javascript. 大概是因为您选择使用WWW :: Mechanize :: Firefox而不是WWW :: Mechanize,因为它支持使用Javascript构建的渲染页面。

And yet, when creating your Mechanize object, you explicitly turn off the Javascript support. 但是,在创建Mechanize对象时,您明确关闭了Javascript支持。

$mech->allow( javascript => 0 );

I can't test this theory because you haven't told us which URL you are using, but I bet you get a better result if you change that line to: 我无法检验该理论,因为您没有告诉我们您使用的是哪个URL,但是我敢打赌,如果将该行更改为:

$mech->allow( javascript => 1 );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM