如何使用WGET或Perl下载使用PHP / JavaScript内容编码的HTML

Question

I have a URL which I want to download and parse: 我有一个我想下载和解析的URL：

http://diana.cslab.ece.ntua.gr/micro-CDS/index.php?r=search/results_mature&mir=hsa-miR-3131&kwd=MIMAT0014996

The problem is when I download with unix wget the following way: 问题是当我使用unix wget以下列方式下载时：

$ wget [the above url]

It gave me the content which is different with those I saw over the browser (namely, the list of genes was not there). 它给了我与浏览器中看到的内容不同的内容（即基因列表不存在）。

What's the right way to do it programatically? 以编程方式执行此操作的正确方法是什么？

Answer 1

I've just tested using PHP and its pulling it with the genes list just fine 我刚刚使用PHP进行测试，并将它与基因列表拉得很好

<?php
echo file_get_contents('http://diana.cslab.ece.ntua.gr/micro-CDS/index.php?r=search/results_mature&mir=hsa-miR-3131&kwd=MIMAT0014996');
?>

do you have access to PHP 你有权访问PHP吗？

Answer 2

#/usr/bin/perl

use WWW::Mechanize;
use strict;
use warnings;

my $url = "http://diana.cslab.ece.ntua.gr/micro-CDS/index.php?r=search/results_mature&mir=hsa-miR-3131&kwd=MIMAT0014996";

my $mech = WWW::Mechanize->new();
$mech->agent_alias("Windows IE 6");

$mech->get($url);
#now you have access to the HTML code via $mech->content();

To process HTML code I'm strongly recommend to use HTML::TreeBuilder::XPath (or other HTML parsing module) 要处理HTML代码，我强烈建议use HTML::TreeBuilder::XPath （或其他HTML解析模块）

如何使用WGET或Perl下载使用PHP / JavaScript内容编码的HTML

问题描述

2 个解决方案

解决方案1
1 2013-04-18 05:21:10

解决方案2
1 已采纳 2013-04-18 05:21:29

如何使用WGET或Perl下载使用PHP / JavaScript内容编码的HTML

问题描述

2 个解决方案

解决方案1 1 2013-04-18 05:21:10

解决方案2 1 已采纳 2013-04-18 05:21:29

解决方案1
1 2013-04-18 05:21:10

解决方案2
1 已采纳 2013-04-18 05:21:29