简体   繁体   中英

Howto use WWW::Mechanize to access pages split by drop-down list

I have list of genes to download from the following links. the problem that it's separated into 60 pages, under the drop-down list.

http://diana.cslab.ece.ntua.gr/micro-CDS/index.php?r=search/results_mature&mir=hsa-miR-3131&kwd=MIMAT0014996

How can I make WWW::Mechanize access all the genes from all the pages?

This is the current code I have:

use WWW::Mechanize;
use strict;
use warnings;

my $url = "http://diana.cslab.ece.ntua.gr/micro-CDS/index.php?r=search/results_mature&mir=hsa-miR-3131&kwd=MIMAT0014996";

my $mech = WWW::Mechanize->new();
$mech->agent_alias("Windows IE 6");

$mech->get($url);
#only access the first page.

The page drop-down is implemented using Javascript. You can't do this with Mechanize, because it doesn't implement Javascript. See the FAQ

This is easy -- the page number is inside URL (this is for page #11):

my $page_number = 11;
$mech->get( "http://diana.cslab.ece.ntua.gr/micro-CDS/index.php?r=search%2Finitializesearch&keywords=MIMAT0014996&thr=0.41&kegg=&page=" . $page_number );
$pages = 60;
for($i=1;$pages<=60;$i++){
    $url = "http://diana.cslab.ece.ntua.gr/micro-CDS/index.php?r=search%2Finitializesearch&keywords=MIMAT0014996&thr=0.41&kegg=&page=$i"
    $mech->get($url);
}

This should do it. You just need to iterate through the 60 pages, modifying the URL each time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM