please help me to define a perl regular expression

Question

I'm new to everything. Please help. I'm trying to crawl every

<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>

in a webpage. I want to catch the /v/name/idlike123123ksajdfk part. (Knowing that the

<div class="name"><a href="/v/

part is fixed) So I wrote the regular expression (can make you laugh):

~m#<div class="name"><a href="(/v/.*?)">#

It will be very helpful if you correct my stupid code.

Answer 1

Using a robust HTML parser (see http://htmlparsing.com/ for why):

use strictures;
use Web::Query qw();
my $w = Web::Query->new_from_html(<<'HTML');
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
HTML

my @v_links = $w->find('div.name > a[href^="/v/"]')->attr('href');

Answer 2

There are plenty of Perl modules that extract links from HTML. WWW::Mechanize , Mojo::DOM , HTML::LinkExtor , and HTML::SimpleLinkExtor can do it.

Answer 3

Web scraping with Mojolicious is probably simplest way to do it in Perl nowadays

http://mojolicio.us/perldoc/Mojolicious/Guides/Cookbook#Web_scraping

Answer 4

You should not use regex for parsing HTML , as there are many libraries for such parsing.

Daxim's answer is good example.

However if you want to use regex anyway and you have your text assigned to $_ , then

my @list = m{<div class="name"><a href="(/v/.*?)">}g;

will get you a list of all findings.

please help me to define a perl regular expression

Question

4 answers

solution1
6 ACCPTED 2012-05-18 11:47:07

solution2
1 2012-05-18 19:27:17

solution3
1 2012-06-12 19:09:03

solution4
0 2012-05-18 11:47:40

please help me to define a perl regular expression

Question

4 answers

solution1 6 ACCPTED 2012-05-18 11:47:07

solution2 1 2012-05-18 19:27:17

solution3 1 2012-06-12 19:09:03

solution4 0 2012-05-18 11:47:40

solution1
6 ACCPTED 2012-05-18 11:47:07

solution2
1 2012-05-18 19:27:17

solution3
1 2012-06-12 19:09:03

solution4
0 2012-05-18 11:47:40