（正则表达式）检索带有2个符号之间的特定单词的整个短语

Question

My question looks like some other questions in Stackoverflow, but i did not find exacly what I was looking for. 我的问题看起来像Stackoverflow中的其他问题，但是我并没有发现我要找的东西。

I need to retrive a whole phrase that contains a specific word. 我需要检索包含特定单词的整个短语。 This phrase is also between ">" and "<". 此短语也在“>”和“ <”之间。

For example: 例如：

text:
 "<div>bla bla bla</div><div>blu blu GOLD blu</div><form> bla bla...."

What I need is:
 blu blu GOLD blu

I'm trying to do that in Perl. 我正在Perl中尝试这样做。 What I have until now is: 到目前为止，我所拥有的是：

$specific_word = GOLD;
while ($var=~/[>]?(?<phrase>(.*?)\Q$specific_word\E(.*?))</ig) {
   script.....
}

What I get with this regex, given the example above, is: <div>bla bla bla</div><div>blu blu GOLD blu 给定上面的示例，我使用此正则表达式得到的是： <div>bla bla bla</div><div>blu blu GOLD blu

How do I do to find the first ">" before my specific word, and not the first ">" of the entire text? 如何找到特定单词前的第一个“>”而不是整个文本的第一个“>”？

Answer 1

HTML::TreeBuilder is a better way to parse HTML in Perl. HTML :: TreeBuilder是在Perl中解析HTML的更好方法。

But to answer the question, you probably want to match /[^>]*${specific_word}[^<]*/g , which basically says that > is not on the left hand side and < is not on the right hand side of the phrase. 但是要回答这个问题，您可能想匹配/[^>]*${specific_word}[^<]*/g ，这基本上表示>不在左侧，而<不在右侧这个词组

Answer 2

An html parser's been rightly mentioned. 正确地提到了html解析器。 You can find "GOLD" in the second div of your string by using Mojo::DOM in the following way: 您可以通过以下方式使用Mojo :: DOM在字符串的第二个div中找到“ GOLD”：

use strict;
use warnings;
use Mojo::DOM;

my $html = '<div>bla bla bla</div><div>blu blu GOLD blu</div>';
my $dom  = Mojo::DOM->new($html);

for my $e ( $dom->div->each ) {
    print $e->text if $e->text =~ /\bGOLD\b/;
}

Output: 输出：

blu blu GOLD blu

（正则表达式）检索带有2个符号之间的特定单词的整个短语

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-01-13 10:59:26

解决方案2
1 2013-01-13 20:42:24

（正则表达式）检索带有2个符号之间的特定单词的整个短语

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-01-13 10:59:26

解决方案2 1 2013-01-13 20:42:24

解决方案1
3 已采纳 2013-01-13 10:59:26

解决方案2
1 2013-01-13 20:42:24