[英]How do I download an image file from a website using WWW::Mechanize?
I try to download the image from the server. 我尝试从服务器下载图像。 I try so far,
到目前为止,
use warnings;
use strict;
use WWW::Mechanize;
my $sequence = "MIPTLAAEPRKPARPPLPVRRESREEPVDAVIVGTGAGGAPLLARLAQAGLKVVALEAGNHWDPAADFATDEREQNKLFWFDERLSAGADPLAFGRNNSGIGVGGSTLHYTAYVPRPQPDDFRLYSDFGVGEDWPIGYGDLEPYFDELECFLGVSGPSPYPWGPARTPYPLAPMPLNAAAQLMARGCAALGLRTSPAANAVLSAPYFQSGVGWRSPCTNRGFCQAGCTTGGKAGMDVTFIPLALAHGAEVRSGAFVTRIETDRAGRVTGVVYVREGREERQRCRTLFLAAGAIETPRLLLLNGLANQSGEVGRNFMAHPGLQLWGQFSEATRPFKGVPGSLISEDTHRPKDADFAGGYLLQSIGVMPVTYATQTARGGGLWGEKLQSHMHGYNHTAGINILGECLPYAHNYLELSDEPDQRGLPKPRIHFSNGKNERRLRDHAEALMRRIWEAAGAQAVWTFERNAHTIGTCRMGADPKRAVVDPEGRAFDVPNLYIIDNSVFPSALSVNPALTIMALSLRTADRFIERTQRGEY";
my $mech = WWW::Mechanize -> new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
$mech->submit_form(
form_number => 1,
fields => {
'notice' => $sequence,
},
);
$mech->find_image( alt_regex => qr/.+sopma2.gif/ );
open (FH, ">soi.gif");
binmode (FH);
print FH $mech;
The image tag was like this: 图片标签如下所示:
<img align="TOP" src="/tmp/e3a3c2b34201.sopma2.gif">
I already have the link to the image parsed from the website, but I want to download this image. 我已经具有从网站解析的图像的链接,但是我想下载该图像。 How can I do it?
我该怎么做?
The find_image
method of WWW::Mechanize returns a WWW::Mechanize::Image object. WWW :: Mechanize的
find_image
方法 返回 WWW :: Mechanize :: Image对象。 That only contains the URI, filename and alt tag info about the image, not the content of the image itself. 它仅包含有关图像的URI,文件名和alt标签信息,而不包含图像本身的内容。 You need to download the image file first.
您需要先下载图像文件。
Luckily, you can use your $mech
for that. 幸运的是,您可以使用
$mech
。 The $image
has a URI
method that returns the full URL of that image file. $image
具有URI
方法 ,该方法返回该图像文件的完整URL 。 Your $mech
can get
that image. 您的
$mech
可以get
该图像。 It comes as a HTTP::Response. 它以HTTP :: Response的形式出现。
my $image = $mech->find_image( url_regex => qr/sopma2\.gif$/ );
my $res = $mech->get($image->URI);
if ($res->is_success) {
open (my $fh, '>', 'soi.gif') or die $!;
binmode $fh;
print $fh $res->decoded_content;
# no need to close lexical filehandle
}
Et voila, there's your image file. 等等,这是您的图片文件。
You can use $mech->get(...)
to store URL content into local file. 您可以使用
$mech->get(...)
将URL内容存储到本地文件中。
if( my $image = $mech->find_image( alt_regex => qr/.+sopma2.gif/ )) {
$mech->get( $img->url, ':content_file' => 'soi.gif');
}
How do i save an image with www::mechanize 如何使用www :: mechanize保存图像
man WWW::Mechanize
$mech->find_image()
$ mech-> find_image()
Finds an image in the current page.在当前页面中查找图像。 It returns a WWW::Mechanize::Image object which describes the image.
它返回一个描述图像的WWW :: Mechanize :: Image对象。 If it fails to find an image it returns undef.
如果找不到图像,则返回undef。
......
$mech->get( $uri )$ mech-> get($ uri)
Given a URL/URI, fetches it.给定一个URL / URI,获取它。 Returns an HTTP::Response object.
返回一个HTTP :: Response对象。 $uri can be a well-formed URL string, a URI object, or a WWW::Mechanize::Link object.
$ uri可以是格式正确的URL字符串,URI对象或WWW :: Mechanize :: Link对象。 [...]
[...]
"get()" is a well-behaved overloaded version of the method in LWP::UserAgent.“ get()”是LWP :: UserAgent中方法的行为重载的版本。 This lets you do things like
这使您可以执行以下操作
$mech->get( $uri, ':content_file' => $tempfile );
The problem is that you are searching for an image whose alt text contains the string sopma2.gif
. 问题是您要搜索的替代文本包含字符串
sopma2.gif
的图像。 That image doesn't have an alt text so your program doesn't find it 该图像没有替代文本,因此您的程序找不到它
This program will fetch the gif file that you want. 该程序将获取所需的gif文件。 I'm using
url_regex => qr/sopma2/i
to find sopma2
in the URL instead. 我正在使用
url_regex => qr/sopma2/i
在URL中查找sopma2
。 That succeeds and returns a WWW::Mechanize::Image
object. 这将成功并返回
WWW::Mechanize::Image
对象。 Then all that is necessary is to fetch that objects absolute URL and use get
with a :content_file
parameter to save the data to a disk file 然后,所有需要做的就是获取该对象的绝对URL,并使用
get
与:content_file
参数将数据保存到磁盘文件中
use strict;
use warnings;
use 5.010;
use WWW::Mechanize;
STDOUT->autoflush;
my $sequence = "MIPTLAAEPRKPARPPLPVRRESREEPVDAVIVGTGAGGAPLLARLAQAGLKVVALEAGNHWDPAADFATDEREQNKLFWFDERLSAGADPLAFGRNNSGIGVGGSTLHYTAYVPRPQPDDFRLYSDFGVGEDWPIGYGDLEPYFDELECFLGVSGPSPYPWGPARTPYPLAPMPLNAAAQLMARGCAALGLRTSPAANAVLSAPYFQSGVGWRSPCTNRGFCQAGCTTGGKAGMDVTFIPLALAHGAEVRSGAFVTRIETDRAGRVTGVVYVREGREERQRCRTLFLAAGAIETPRLLLLNGLANQSGEVGRNFMAHPGLQLWGQFSEATRPFKGVPGSLISEDTHRPKDADFAGGYLLQSIGVMPVTYATQTARGGGLWGEKLQSHMHGYNHTAGINILGECLPYAHNYLELSDEPDQRGLPKPRIHFSNGKNERRLRDHAEALMRRIWEAAGAQAVWTFERNAHTIGTCRMGADPKRAVVDPEGRAFDVPNLYIIDNSVFPSALSVNPALTIMALSLRTADRFIERTQRGEY";
my $mech = WWW::Mechanize->new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
say $mech->res->status_line;
say $mech->title;
$mech->submit_form(
form_number => 1,
fields => {
notice => $sequence,
},
);
say $mech->res->status_line;
say $mech->title;
my $image = $mech->find_image( url_regex => qr/sopma2/i );
my ($file) = $image->url =~ m|([^/]+\z)|;
$mech->get($image->url_abs, ':content_file' => $file);
say "$file saved";
200 OK
NPS@ : SOPMA secondary structure prediction
200 OK
NPS@ SOPMA secondary structure prediction results
373025433891.sopma2.gif saved
use LWP::Simple with WWW::Mechanize. 与WWW :: Mechanize一起使用LWP :: Simple 。
use WWW::Mechanize;
use LWP::Simple;
my $sequence = "MIPTLAA......";
my $mech = WWW::Mechanize -> new;
$mech->get('https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html');
$mech->submit_form(
form_number => 1,
fields => {
'notice' => $sequence,
},
);
my $cont = $mech->content;
($img) = $cont =~m/SRC=(.+sopma2\.gif)/g;
$urL = "https://npsa-prabi.ibcp.fr/$img";
getstore($urL,"soi.gif");
$img
stores the url of the image $img
存储图像的URL
Then save the image by using getstore
method from the LWP::Simple
然后使用
LWP::Simple
getstore
方法保存图像
It is not good idea. 这不是一个好主意。 See the @simbabque answer.
请参阅@simbabque答案。 But it give the result what you need.
但是它给您所需的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.