简体   繁体   中英

How do I paginate using WWW::Mechanize?

I am using Windows Perl 5.16.3 in a 64 bit machine.

When I use Mechanize with the following URL

http://www.utsavfashion.in/indowestern

everything works fine.

However, when i try using the pagination option ie

change the last string to indowestern#pg=2

I cannot get the second page content. I still get the content from the first page.

Please see code snippet below:

my $url = "http://www.utsavfashion.in/indowestern#pg=2";

$m = WWW::Mechanize->new();
$m->get($url);
print "$url\n";

my $c = $m->content;

print "$c\n";

Thanks in advance for the advice!

Web browsers don't even send #pg=2 to the web server. I don't know if WWW::Mechanize does or not, but it shouldn't. So it's no surprise that you get the same page for

http://www.utsavfashion.in/indowestern

and for

http://www.utsavfashion.in/indowestern#pg=2

The difference is not in what's fetched , it's in what's rendered .

When you use your web browser to render it, the JavaScript in the page checks the anchor and updates the content accordingly.

When you use your print $content to render it, it looks quite different. You don't get nicely formatted text or the effects of JavaScript.

There are options for processing JavaScript in downloaded content. But if you're data mining, it would probably be more efficient and reliable to replicate what the JavaScript does instead, since it presumably just does another web request to get the data if it's not found in the downloaded document.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM