I am using Windows Perl 5.16.3 in a 64 bit machine.
When I use Mechanize with the following URL
http://www.utsavfashion.in/indowestern
everything works fine.
However, when i try using the pagination option ie
change the last string to indowestern#pg=2
I cannot get the second page content. I still get the content from the first page.
Please see code snippet below:
my $url = "http://www.utsavfashion.in/indowestern#pg=2";
$m = WWW::Mechanize->new();
$m->get($url);
print "$url\n";
my $c = $m->content;
print "$c\n";
Thanks in advance for the advice!
Web browsers don't even send #pg=2
to the web server. I don't know if WWW::Mechanize does or not, but it shouldn't. So it's no surprise that you get the same page for
http://www.utsavfashion.in/indowestern
and for
http://www.utsavfashion.in/indowestern#pg=2
The difference is not in what's fetched , it's in what's rendered .
When you use your web browser to render it, the JavaScript in the page checks the anchor and updates the content accordingly.
When you use your print $content
to render it, it looks quite different. You don't get nicely formatted text or the effects of JavaScript.
There are options for processing JavaScript in downloaded content. But if you're data mining, it would probably be more efficient and reliable to replicate what the JavaScript does instead, since it presumably just does another web request to get the data if it's not found in the downloaded document.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.