简体   繁体   English

如何使用WWW :: Mechanize进行分页?

[英]How do I paginate using WWW::Mechanize?

I am using Windows Perl 5.16.3 in a 64 bit machine. 我在64位计算机上使用Windows Perl 5.16.3。

When I use Mechanize with the following URL 当我通过以下网址使用Mechanize时

http://www.utsavfashion.in/indowestern http://www.utsavfashion.in/indowestern

everything works fine. 一切正常。

However, when i try using the pagination option ie 但是,当我尝试使用分页选项时

change the last string to indowestern#pg=2 将最后一个字符串更改为indowestern#pg = 2

I cannot get the second page content. 我无法获得第二页的内容。 I still get the content from the first page. 我仍然从第一页获得内容。

Please see code snippet below: 请参见下面的代码段:

my $url = "http://www.utsavfashion.in/indowestern#pg=2";

$m = WWW::Mechanize->new();
$m->get($url);
print "$url\n";

my $c = $m->content;

print "$c\n";

Thanks in advance for the advice! 预先感谢您的建议!

Web browsers don't even send #pg=2 to the web server. Web浏览器甚至不会将#pg=2发送到Web服务器。 I don't know if WWW::Mechanize does or not, but it shouldn't. 我不知道WWW :: Mechanize是否可以,但是不可以。 So it's no surprise that you get the same page for 因此,您获得相同的页面也就不足为奇了

http://www.utsavfashion.in/indowestern

and for 和为

http://www.utsavfashion.in/indowestern#pg=2

The difference is not in what's fetched , it's in what's rendered . 区别不在于获取的内容 ,而在于渲染的内容

When you use your web browser to render it, the JavaScript in the page checks the anchor and updates the content accordingly. 使用Web浏览器进行呈现时,页面中的JavaScript会检查锚点并相应地更新内容。

When you use your print $content to render it, it looks quite different. 当您使用print $content呈现它时,它看起来完全不同。 You don't get nicely formatted text or the effects of JavaScript. 您不会得到格式正确的文本或JavaScript的效果。

There are options for processing JavaScript in downloaded content. 有一些选项可用于处理下载内容中的JavaScript。 But if you're data mining, it would probably be more efficient and reliable to replicate what the JavaScript does instead, since it presumably just does another web request to get the data if it's not found in the downloaded document. 但是,如果您正在进行数据挖掘,那么复制JavaScript可能会更高效,更可靠,因为如果在下载的文档中找不到数据,它大概只会执行另一个Web请求来获取数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM