简体   繁体   English

如何使用Perl访问JavaScript驱动的网页的内容?

[英]How can I access the contents of a JavaScript driven web page with Perl?

I was trying to make a little app with Perl to fetch summoner names of League of Legends from LolKing . 我试图用Perl制作一个小应用程序,以从LolKing获取英雄联盟的召唤者名称。

The HTML code has lines like HTML代码中的行如下

<tr data-summonername="MatLife TriHard" class="lb_row_rank_4">

so I was just going with something like 所以我只是想像

use strict;
use warnings;

use LWP::Simple;
use HTML::Parser;

my $find_links = HTML::Parser->new(
  start_h => [
    sub {
      my ($tag, $attr) = @_;
      if ($tag eq 'tr' and exists $attr->{'data-summonername'}) {
        print "$attr->{'data-summonername'}\n";
      }
    },
    "tag, attr"
  ]
);

my $html = get('http://www.lolking.net/leaderboards/#/na/1') or die 'nope';

$find_links->parse($html);

but this give me nothing. 但这什么也没给我。 Even with attr=class , it give me nothing. 即使使用attr=class ,它也不会给我任何东西。 I can't fetch the tr element's class for some reason. 由于某种原因,我无法获取tr元素的类。

Using $attr->{data-summonername} without the single quotes gave me some errors, due to the hyphen I suppose. 由于我想使用连字符,因此在不带单引号的情况下使用$attr->{data-summonername}会给我带来一些错误。 If I fetch $attr->{href} it works just fine. 如果我获取$attr->{href}它就可以正常工作。

Can someone help me out? 有人可以帮我吗?

The problem is that the HTML for that page is mostly built by your browser using JavaScript after the page has been downloaded. 问题在于,该页面的HTML主要是由浏览器在下载页面后使用JavaScript构建的。 Using LWP::Simple::get will just retrieve the skeleton HTML and the JavaScript code. 使用LWP::Simple::get只会检索框架HTML和JavaScript代码。 You can see that if you print $html instead of parsing it. 您会看到,如果您print $html而不是对其进行分析。

The usual solution is to use WWW::Mechanize::Firefox which gets an installed Firefox to download and build the page which you can then query. 通常的解决方案是使用WWW::Mechanize::Firefox ,它会安装一个Firefox以下载并构建页面,然后您可以查询该页面。 It's a lot more complex than a simple get though, as you have to install Firefox if you don't already have it, as well as the Mozilla MozRepl addon which enables remote control. 它比简单的get要复杂得多,因为您必须安装Firefox(如果尚未安装)以及Mozilla MozRepl附加组件,该附加组件可实现远程控制。 Even then you may still get problems with accessing the contents of the page before the browser has finished building it, so it's not for the faint of heart. 即使这样,在浏览器完成构建页面之前,访问页面内容仍可能会遇到问题,因此这不是出于胆小。


Update 更新资料

For your interest, here is a solution using WWW::Mechanize::Firefox . 为了您的利益,这是使用WWW::Mechanize::Firefox的解决方案。

use strict;
use warnings;

use WWW::Mechanize::Firefox;
use HTML::TreeBuilder::XPath;

my $url = 'http://www.lolking.net/leaderboards/#/na/1';

my $mech = WWW::Mechanize::Firefox->new;
my $resp = $mech->get($url);
die $resp->status_line unless $resp->is_success;

my $tree = HTML::TreeBuilder::XPath->new_from_content($resp->content);

for my $node ( $tree->findnodes('//tr[starts-with(@class, "lb_row_rank")]') ) {
  printf "Rank %2d: %s\n",
      $node->attr('class') =~ /(\d+)/,
      $node->attr('data-summonername');
}

output 输出

Rank  1: Doublelift
Rank  2: F5 Veritas
Rank  3: Life Love Live 
Rank  4: MatLife TriHard
Rank  5: TDK Kyle
Rank  6: Liquid FeniX
Rank  7: Liquid Inori TV
Rank  8: dawoofsclaw
Rank  9: who is he
Rank 10: Ohhhq

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 PHP 或 JavaScript 在网页正文中显示描述元标记的内容? - How can I use either PHP or JavaScript to display the contents of the description meta tag in the body text of a web page? 如何使用Perl从使用JavaScript动态生成的网页中获取文本? - How can I use Perl to grab text from a web page that is dynamically generated with JavaScript? 如何在Perl Web爬虫中处理Javascript? - How can I handle Javascript in a Perl web crawler? web 浏览器中的 JavaScript:我可以使其事件驱动而不是循环吗? - JavaScript in a web browser: Can I make it event driven instead of looping? 如何通过JavaScript访问表单提交的内容? - How can I access the contents of a form submission via Javascript? 如何使用JavaScript或JQuery访问框架或iframe的内容? - How can I access the contents of a frame or iframe with JavaScript or JQuery? 我可以使用 JavaScript 从 web 页面访问元素吗? - Can I access elements from a web page with JavaScript? 用于获取网页内容的Javascript? - Javascript for fetching the contents of an web page? 即使更改了文件,如何使网页显示文件的内容? - How can I make a web page show the contents of a file even after I change the file? 如何从网页访问iPhone的相机? - How can I access a iPhone's camera from a web page?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM