使用Perl从除HTML Anchor Links之外的字符串中去除所有内容

Question

Using Perl, how can I use a regex to take a string that has random HTML in it with one HTML link with anchor, like this: 使用Perl，我如何使用正则表达式来获取带有随机HTML的字符串，其中包含一个带有锚点的HTML链接，如下所示：

  <a href="http://example.com" target="_blank">Whatever Example</a>

and it leave ONLY that and get rid of everything else? 它只留下并摆脱其他一切？ No matter what was inside the href attribute with the <a, like title= , or style= , or whatever. 无论href属性中的内容是什么，如<a，如title= ，或style= ，或者其他什么。 and it leave the anchor: "Whatever Example" and the </a>? 它离开主播：“无论如何”和</a>？

Answer 1

You can take advantage of a stream parser such as HTML::TokeParser::Simple : 您可以利用HTML :: TokeParser :: Simple等流解析器：

#!/usr/bin/env perl

use strict;
use warnings;

use HTML::TokeParser::Simple;

my $html = <<EO_HTML;

Using Perl, how can I use a regex to take a string that has random HTML in it
with one HTML link with anchor, like this:

   <a href="http://example.com" target="_blank">Whatever <i>Interesting</i> Example</a>

       and it leave ONLY that and get rid of everything else? No matter what
   was inside the href attribute with the <a, like title=, or style=, or
   whatever. and it leave the anchor: "Whatever Example" and the </a>?
EO_HTML

my $parser = HTML::TokeParser::Simple->new(string => $html);

while (my $tag = $parser->get_tag('a')) {
    print $tag->as_is, $parser->get_text('/a'), "</a>\n";
}

Output: 输出：

$ ./whatever.pl
<a href="http://example.com" target="_blank">Whatever Interesting Example</a>

Answer 2

If you need a simple regex solution, a naive approach might be: 如果您需要一个简单的正则表达式解决方案，一个天真的方法可能是：

my @anchors = $text =~ m@(<a[^>]*?>.*?</a>)@gsi;

However, as @dan1111 has mentioned, regular expressions are not the right tool for parsing HTML for various reasons . 但是，正如@ dan1111所提到的，正则表达式不是出于各种原因解析HTML的正确工具。

If you need a reliable solution, look for an HTML parser module . 如果您需要可靠的解决方案，请查找HTML解析器模块。

使用Perl从除HTML Anchor Links之外的字符串中去除所有内容

问题描述

2 个解决方案

解决方案1
2 已采纳 2015-05-15 10:25:58

解决方案2
1 2015-05-15 08:38:16

使用Perl从除HTML Anchor Links之外的字符串中去除所有内容

问题描述

2 个解决方案

解决方案1 2 已采纳 2015-05-15 10:25:58

解决方案2 1 2015-05-15 08:38:16

解决方案1
2 已采纳 2015-05-15 10:25:58

解决方案2
1 2015-05-15 08:38:16