如何使用WWW :: Mechanize获得与正则表达式匹配的链接？

Question

I'm trying to use regular expressions to catch a link, but can not. 我正在尝试使用正则表达式来捕获链接，但是不能。 I have all the links, but there are many links that do not want. 我拥有所有链接，但是有许多不需要的链接。

What I do is to grab all links: http://valeptr.com/scripts/runner.php?IM= To comply with this pattern. 我要做的是获取所有链接： http://valeptr.com/scripts/runner.php?IM= : http://valeptr.com/scripts/runner.php?IM=遵守此模式。

I put the script I'm doing: 我把我正在做的脚本：

use warnings;
use strict;
use WWW::Mechanize;
use WWW::Mechanize::Sleepy;

my $Explorador =

    WWW::Mechanize->new(

       agent =>
             'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624',

       sleep => '5..20'
    );

#Proceed to access the URL to find all the links in emails
$Explorador->get("file:/home/alejandro/Escritorio/hehe.php.html");

#If you want debug DOM Document.
#print $Explorador->content();

my @links = $Explorador->links;

foreach my $link (@links) {

   # Retrieve the link URL like:
   # http://valeptr.com/scripts/runner.php?IM=0cdb7d48110375.
   my $href = $link->url;

   foreach my $s ($href) { #Aqui la expresión regular

       my @links = $s =~ qr{
                               (
                               [^B]*
                               )
                               $
                           }x;
       foreach (@links) {
           print "\n",$_;
       }
   }
}

PS: I guess this regular expression will be more than seen but not seen. PS：我猜想这个正则表达式将比看到的更多但看不见。 If so am coming back to put a post with the same. 如果是这样的话，回来再发表一个相同的帖子。

Problem: There is a heap of links and I need cojer the links that expire with the boss: Http: // valeptr.com/scripts/runner.php?IM= For it in the line 19 I have to apply an expression regulate. 问题：有大量的链接，我需要链接与老板终止的链接： Http: // valeptr.com/scripts/runner.php?IM=为此，在第19行中，我必须应用表达式Http: // valeptr.com/scripts/runner.php?IM= 。 This variable my @links=$Explorador->links; 这个变量我的@ links = $ Explorador-> links; he returns all the links that exist. 他返回所有存在的链接。 But I want cojer only the link that I have put above. 但是我只希望我上面已经提到的链接。 Sincerely, 真诚的

Answer 1

Why not get WWW::Mechanize to do the work for you, especially when it can filter out the links for you via a supplied regex? 为什么不让WWW::Mechanize为您完成工作，特别是当它可以通过提供的正则表达式为您过滤链接时？

my @wanted_links = $Explorador->find_all_links ( 
                                     url_regex => qr{scripts/runner\.php\?IM=}
                                );

No for loops! 没有for循环！

Answer 2

As your reference link seems to be fix, you could take into account using substr instead of regex 由于您的参考链接似乎是固定的，因此您可以考虑使用substr而不是regex

$ref_link = q!http://valeptr.com/scripts/runner.php?IM=!;
foreach my $link ( $Explorador->links ) {
    my $href = $link->url;
    if ( substr($href, 0, length($ref_link)) eq $ref_link ) {
        push @save, $href;
    }
}

如何使用WWW :: Mechanize获得与正则表达式匹配的链接？

问题描述

2 个解决方案

解决方案1
6 已采纳 2010-07-08 06:46:18

解决方案2
0 2010-07-08 08:47:58

如何使用WWW :: Mechanize获得与正则表达式匹配的链接？

问题描述

2 个解决方案

解决方案1 6 已采纳 2010-07-08 06:46:18

解决方案2 0 2010-07-08 08:47:58

解决方案1
6 已采纳 2010-07-08 06:46:18

解决方案2
0 2010-07-08 08:47:58