简体   繁体   English

如何使用WWW :: Mechanize从网站下载图像文件?

[英]How do I download an image file from a website using WWW::Mechanize?

I try to download the image from the server. 我尝试从服务器下载图像。 I try so far, 到目前为止,

use warnings;
use strict; 
use WWW::Mechanize;

my $sequence = "MIPTLAAEPRKPARPPLPVRRESREEPVDAVIVGTGAGGAPLLARLAQAGLKVVALEAGNHWDPAADFATDEREQNKLFWFDERLSAGADPLAFGRNNSGIGVGGSTLHYTAYVPRPQPDDFRLYSDFGVGEDWPIGYGDLEPYFDELECFLGVSGPSPYPWGPARTPYPLAPMPLNAAAQLMARGCAALGLRTSPAANAVLSAPYFQSGVGWRSPCTNRGFCQAGCTTGGKAGMDVTFIPLALAHGAEVRSGAFVTRIETDRAGRVTGVVYVREGREERQRCRTLFLAAGAIETPRLLLLNGLANQSGEVGRNFMAHPGLQLWGQFSEATRPFKGVPGSLISEDTHRPKDADFAGGYLLQSIGVMPVTYATQTARGGGLWGEKLQSHMHGYNHTAGINILGECLPYAHNYLELSDEPDQRGLPKPRIHFSNGKNERRLRDHAEALMRRIWEAAGAQAVWTFERNAHTIGTCRMGADPKRAVVDPEGRAFDVPNLYIIDNSVFPSALSVNPALTIMALSLRTADRFIERTQRGEY";

my $mech = WWW::Mechanize -> new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
    $mech->submit_form(
        form_number => 1,
        fields => {
        'notice' => $sequence,
        },
    );


$mech->find_image( alt_regex => qr/.+sopma2.gif/ );
open (FH, ">soi.gif");
binmode (FH);
print FH $mech;

The image tag was like this: 图片标签如下所示:

<img align="TOP" src="/tmp/e3a3c2b34201.sopma2.gif">

I already have the link to the image parsed from the website, but I want to download this image. 我已经具有从网站解析的图像的链接,但是我想下载该图像。 How can I do it? 我该怎么做?

The find_image method of WWW::Mechanize returns a WWW::Mechanize::Image object. WWW :: Mechanizefind_image方法 返回 WWW :: Mechanize :: Image对象。 That only contains the URI, filename and alt tag info about the image, not the content of the image itself. 它仅包含有关图像的URI,文件名和alt标签信息,而不包含图像本身的内容。 You need to download the image file first. 您需要先下载图像文件。

Luckily, you can use your $mech for that. 幸运的是,您可以使用$mech The $image has a URI method that returns the full URL of that image file. $image具有URI方法 ,该方法返回该图像文件的完整URL Your $mech can get that image. 您的$mech可以get该图像。 It comes as a HTTP::Response. 它以HTTP :: Response的形式出现。

my $image = $mech->find_image( url_regex => qr/sopma2\.gif$/ );
my $res = $mech->get($image->URI);

if ($res->is_success) {
  open (my $fh, '>', 'soi.gif') or die $!;
  binmode $fh;
  print $fh $res->decoded_content;
  # no need to close lexical filehandle
}

Et voila, there's your image file. 等等,这是您的图片文件。

You can use $mech->get(...) to store URL content into local file. 您可以使用$mech->get(...)将URL内容存储到本地文件中。

if( my $image = $mech->find_image( alt_regex => qr/.+sopma2.gif/ )) {
  $mech->get( $img->url, ':content_file' => 'soi.gif');
}

How do i save an image with www::mechanize 如何使用www :: mechanize保存图像

man WWW::Mechanize

$mech->find_image() $ mech-> find_image()
Finds an image in the current page. 在当前页面中查找图像。 It returns a WWW::Mechanize::Image object which describes the image. 它返回一个描述图像的WWW :: Mechanize :: Image对象。 If it fails to find an image it returns undef. 如果找不到图像,则返回undef。
... ...
$mech->get( $uri ) $ mech-> get($ uri)
Given a URL/URI, fetches it. 给定一个URL / URI,获取它。 Returns an HTTP::Response object. 返回一个HTTP :: Response对象。 $uri can be a well-formed URL string, a URI object, or a WWW::Mechanize::Link object. $ uri可以是格式正确的URL字符串,URI对象或WWW :: Mechanize :: Link对象。 [...] [...]
"get()" is a well-behaved overloaded version of the method in LWP::UserAgent. “ get()”是LWP :: UserAgent中方法的行为重载的版本。 This lets you do things like 这使您可以执行以下操作
$mech->get( $uri, ':content_file' => $tempfile );

The problem is that you are searching for an image whose alt text contains the string sopma2.gif . 问题是您要搜索的替代文本包含字符串sopma2.gif的图像。 That image doesn't have an alt text so your program doesn't find it 该图像没有替代文本,因此您的程序找不到它

This program will fetch the gif file that you want. 该程序将获取所需的gif文件。 I'm using url_regex => qr/sopma2/i to find sopma2 in the URL instead. 我正在使用url_regex => qr/sopma2/i在URL中查找sopma2 That succeeds and returns a WWW::Mechanize::Image object. 这将成功并返回WWW::Mechanize::Image对象。 Then all that is necessary is to fetch that objects absolute URL and use get with a :content_file parameter to save the data to a disk file 然后,所有需要做的就是获取该对象的绝对URL,并使用get:content_file参数将数据保存到磁盘文件中

use strict;
use warnings;
use 5.010;

use WWW::Mechanize;

STDOUT->autoflush;

my $sequence = "MIPTLAAEPRKPARPPLPVRRESREEPVDAVIVGTGAGGAPLLARLAQAGLKVVALEAGNHWDPAADFATDEREQNKLFWFDERLSAGADPLAFGRNNSGIGVGGSTLHYTAYVPRPQPDDFRLYSDFGVGEDWPIGYGDLEPYFDELECFLGVSGPSPYPWGPARTPYPLAPMPLNAAAQLMARGCAALGLRTSPAANAVLSAPYFQSGVGWRSPCTNRGFCQAGCTTGGKAGMDVTFIPLALAHGAEVRSGAFVTRIETDRAGRVTGVVYVREGREERQRCRTLFLAAGAIETPRLLLLNGLANQSGEVGRNFMAHPGLQLWGQFSEATRPFKGVPGSLISEDTHRPKDADFAGGYLLQSIGVMPVTYATQTARGGGLWGEKLQSHMHGYNHTAGINILGECLPYAHNYLELSDEPDQRGLPKPRIHFSNGKNERRLRDHAEALMRRIWEAAGAQAVWTFERNAHTIGTCRMGADPKRAVVDPEGRAFDVPNLYIIDNSVFPSALSVNPALTIMALSLRTADRFIERTQRGEY";

my $mech = WWW::Mechanize->new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');

say $mech->res->status_line;
say $mech->title;

$mech->submit_form(
    form_number => 1,
    fields => {
      notice => $sequence,
    },
);

say $mech->res->status_line;
say $mech->title;

my $image = $mech->find_image( url_regex => qr/sopma2/i );
my ($file) = $image->url =~ m|([^/]+\z)|;
$mech->get($image->url_abs, ':content_file' => $file);
say "$file saved";

output 输出

200 OK
NPS@ : SOPMA secondary structure prediction
200 OK
NPS@ SOPMA secondary structure prediction results
373025433891.sopma2.gif saved

use LWP::Simple with WWW::Mechanize. 与WWW :: Mechanize一起使用LWP :: Simple

use WWW::Mechanize;
use LWP::Simple;
my $sequence = "MIPTLAA......";

my $mech = WWW::Mechanize -> new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
    $mech->submit_form(
        form_number => 1,
        fields => {
        'notice' => $sequence,
        },
    );

my $cont = $mech->content;  
($img) = $cont =~m/SRC=(.+sopma2\.gif)/g; 
$urL = "https://npsa-prabi.ibcp.fr/$img";
getstore($urL,"soi.gif");

$img stores the url of the image $img存储图像的URL

Then save the image by using getstore method from the LWP::Simple 然后使用LWP::Simple getstore方法保存图像

It is not good idea. 这不是一个好主意。 See the @simbabque answer. 请参阅@simbabque答案。 But it give the result what you need. 但是它给您所需的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM