简体   繁体   English

如何使用WGET或Perl下载使用PHP / JavaScript内容编码的HTML

[英]How to download HTML encoded with PHP/JavaScript content using WGET or Perl

I have a URL which I want to download and parse: 我有一个我想下载和解析的URL:

http://diana.cslab.ece.ntua.gr/micro-CDS/index.php?r=search/results_mature&mir=hsa-miR-3131&kwd=MIMAT0014996

The problem is when I download with unix wget the following way: 问题是当我使用unix wget以下列方式下载时:

$ wget [the above url]

It gave me the content which is different with those I saw over the browser (namely, the list of genes was not there). 它给了我与浏览器中看到的内容不同的内容(即基因列表不存在)。

What's the right way to do it programatically? 以编程方式执行此操作的正确方法是什么?

I've just tested using PHP and its pulling it with the genes list just fine 我刚刚使用PHP进行测试,并将它与基因列表拉得很好

<?php
echo file_get_contents('http://diana.cslab.ece.ntua.gr/micro-CDS/index.php?r=search/results_mature&mir=hsa-miR-3131&kwd=MIMAT0014996');
?>

do you have access to PHP 你有权访问PHP吗?

#/usr/bin/perl

use WWW::Mechanize;
use strict;
use warnings;

my $url = "http://diana.cslab.ece.ntua.gr/micro-CDS/index.php?r=search/results_mature&mir=hsa-miR-3131&kwd=MIMAT0014996";

my $mech = WWW::Mechanize->new();
$mech->agent_alias("Windows IE 6");

$mech->get($url);
#now you have access to the HTML code via $mech->content();

To process HTML code I'm strongly recommend to use HTML::TreeBuilder::XPath (or other HTML parsing module) 要处理HTML代码,我强烈建议use HTML::TreeBuilder::XPath (或其他HTML解析模块)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM