简体   繁体   中英

Parsing Site with Perl LWP::UserAgent — Cookies Required

On a certain project in Perl, I've written several "parsers", which allow me to visit websites with LWP::UserAgent. However, I'm having a problem with one website: it's behaving exactly as if I had visited the site with my browser, having turned off Cookies, so instead of giving me the page I want, it gives me a page with the message that I must turn on cookies. The entire code of my script is below. Any ideas? Thanks in advance.

(Note that I looked at the following url, which seems to be addressing my question, but unfortunately, I was unable to get a working script based on its suggestion: Cookies in perl lwp .)

use strict;
use warnings;
use LWP::UserAgent;
use HTTP::Cookies;
my $useragent = LWP::UserAgent->new;
$useragent->cookie_jar(HTTP::Cookies->new);
my $request = HTTP::Request->new(GET => "http://www.the-site-im-trying-to-parse.com");
my $response = $useragent->request($request);
print "Content-type: text/html\n\n";
print $response->as_string;

Have you considered using WWW::Mechanize module? It does collect cookies automatically by default. And it's a bit easier to use since there are plenty of included methods which are very useful.

All you are doing is downloading the html data over HTTP, so there is no browser interaction until you decide to view the result in one. That being said, the HTTP server has no way of knowing if your request is from a client that has cookies enabled. So doing so won't actually do anything to change the result.

The WWW:Mechanize module is useful for easily traversing web sites, but it won't fix the problem you are facing. So it won't actually help you resolve the issue you are having.

More realistically what it is going on is there is some sort of client-side javascript code that isn't working correctly once you download the file and display it in your browser. This could be any number of things, such as breaking the cross-domain policy implemented in the javascript code. Without providing the URL you are accessing, it is impossible to say.

尝试将cookie_jar设置为临时存储(将其设为空hashref):

$useragent->cookie_jar( {} );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM