简体   繁体   English

AJAX使用Perl WWW:机械化单击按钮

[英]AJAX click on button using Perl WWW:Mechanize

I am carrying out a project for a client and he needs to be able to sweep the table of contents on a particular page. 我正在为一个客户执行一个项目,他需要能够扫描特定页面上的目录。 I have modified his existing code to run a loop as there are now multiple pages to extract content from. 我已经修改了他的现有代码以运行循环,因为现在有多个页面可以从中提取内容。 One of the pages I'm trying to sweep from: https://marriage.ag.gov.au/marriagecelebrants/civil 我要从中扫过的页面之一: https : //marriage.ag.gov.au/marriagecelebrants/civil

You can see there's 162 pages which looks to be running on AJAX to load the next lot of content. 您可以看到有162个页面看起来正在AJAX上运行以加载下一个内容。 The existing code would click based on input name attribute: 现有代码将基于输入名称属性单击:

ctl00$MainContent$gridCelebrants$ctl00$ctl02$ctl00$ctl04 so far all my code does is essentially refresh the page and sweeps the same content 162 times. ctl00 $ MainContent $ gridCelebrants $ ctl00 $ ctl02 $ ctl00 $ ctl04到目前为止,我的所有代码实际上基本上是刷新页面并扫描相同的内容162次。

This is a current snippet: 这是当前片段:

use warnings;
use WWW::Mechanize;
use Data::Dumper;
use HTML::TableExtract;
use Spreadsheet::WriteExcel;

#header();
# create max page array to handle civil and other page.
# number indicates how many times to click through
# first item in array is https://marriage.ag.gov.au/marriagecelebrants/civil
# second item is         https://marriage.ag.gov.au/marriagecelebrants/other
my @max_page_array = qw(
    162
    11
);

# create URL array for the 2 pages to scrape
my @url_array = qw(
    https://marriage.ag.gov.au/marriagecelebrants/civil
    https://marriage.ag.gov.au/marriagecelebrants/other
);
# get size of array
my $url_array_size = scalar @url_array;

# declare vars
my $n = 0;
my $i = 0;
# time to loop through the url's
while( $i < $url_array_size){
    open (raw, ">output-dev-$i.txt");
    close(raw);
    $n = 0;
    my $mech = WWW::Mechanize->new(autocheck => 1);
    $mech->get( $url_array[$i] );

    open (raw, ">>output-dev-$i.txt");
    while($n < $max_page_array[$i]){
        my $c = $mech->content;
        my $te = HTML::TableExtract->new(br_translate => 1,keep_html => 0);
        $te->parse($c);
        foreach my $ts ($te->tables) {
            foreach my $row ($ts->rows) {
                print raw join(',', @$row);
            }
        }

       #this was existing code
       #$mech->click( "ctl00\$MainContent\$gridCelebrants\$ctl00\$ctl02\$ctl00\$ctl04" );

       #tried multiple variations based on documentation and got nowhere
       $mech->click_button( 'ctl00$MainContent$gridCelebrants$ctl00$ctl02$ctl00$ctl04' );
       $n++;
    }
    close raw;
    $i++;
} # while loop - url array size 

My question is, when you click next, how can I get my perl script to to load the next page and sweep next set of data? 我的问题是,当您单击下一步时,如何获取我的perl脚本来加载下一页并清除下一组数据?

My question is, when you click next, how can I get my perl script to to load the next page and sweep next set of data? 我的问题是,当您单击下一步时,如何获取我的perl脚本来加载下一页并清除下一组数据?

WWW::Mechanize does not support JavaScript, as per the FAQ . 根据FAQWWW :: Mechanize不支持JavaScript。 It provides a list of alternatives that do, see also WWW::Mechanize::PhantomJS . 它提供了替代方法列表,另请参见WWW :: Mechanize :: PhantomJS

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM