简体   繁体   English

如何在Perl Web爬虫中处理Javascript?

[英]How can I handle Javascript in a Perl web crawler?

I would like to crawl a website, the problem is, that its full of JavaScript things, such as buttons and such that when they are pressed, they do not change the URL, but the data on the page is changed. 我想抓取一个网站,问题是,它充满了JavaScript的东西,比如按钮等,当按下它们时,它们不会改变URL,但页面上的数据会被更改。

Usually I use LWP / Mechanize etc to crawl sites, but neither support JavaScript. 通常我使用LWP / Mechanize等来抓取网站,但都不支持JavaScript。 any idea? 任何的想法?

The WWW::Scripter module has a JavaScript plugin that may be useful. WWW :: Scripter模块有一个可能有用的JavaScript插件 Can't say I've used it myself, however. 但不能说我自己用过它。

另一种选择可能是SeleniumWWW :: Selenium模块

WWW::Mechanize::Firefox might be of use. WWW :: Mechanize :: Firefox可能会有用。 that way you can have Firefox handle the complex JavaScript issues and then extract the resultant html. 这样你就可以让Firefox处理复杂的JavaScript问题,然后提取生成的html。

I would suggest HtmlUnit and Perl wrapper: WWW::HtmlUnit . 我建议使用HtmlUnit和Perl包装器: WWW :: HtmlUnit

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM