简体   繁体   English

(正则表达式)检索带有2个符号之间的特定单词的整个短语

[英](regex) retrieve whole phrase with specific word between 2 symbols

My question looks like some other questions in Stackoverflow, but i did not find exacly what I was looking for. 我的问题看起来像Stackoverflow中的其他问题,但是我并没有发现我要找的东西。

I need to retrive a whole phrase that contains a specific word. 我需要检索包含特定单词的整个短语。 This phrase is also between ">" and "<". 此短语也在“>”和“ <”之间。

For example: 例如:

text:
 "<div>bla bla bla</div><div>blu blu GOLD blu</div><form> bla bla...."

What I need is:
 blu blu GOLD blu

I'm trying to do that in Perl. 我正在Perl中尝试这样做。 What I have until now is: 到目前为止,我所拥有的是:

$specific_word = GOLD;
while ($var=~/[>]?(?<phrase>(.*?)\Q$specific_word\E(.*?))</ig) {
   script.....
}

What I get with this regex, given the example above, is: <div>bla bla bla</div><div>blu blu GOLD blu 给定上面的示例,我使用此正则表达式得到的是: <div>bla bla bla</div><div>blu blu GOLD blu

How do I do to find the first ">" before my specific word, and not the first ">" of the entire text? 如何找到特定单词前的第一个“>”而不是整个文本的第一个“>”?

HTML::TreeBuilder is a better way to parse HTML in Perl. HTML :: TreeBuilder是在Perl中解析HTML的更好方法。

But to answer the question, you probably want to match /[^>]*${specific_word}[^<]*/g , which basically says that > is not on the left hand side and < is not on the right hand side of the phrase. 但是要回答这个问题,您可能想匹配/[^>]*${specific_word}[^<]*/g ,这基本上表示>不在左侧,而<不在右侧这个词组

An html parser's been rightly mentioned. 正确地提到了html解析器。 You can find "GOLD" in the second div of your string by using Mojo::DOM in the following way: 您可以通过以下方式使用Mojo :: DOM在字符串的第二个div中找到“ GOLD”:

use strict;
use warnings;
use Mojo::DOM;

my $html = '<div>bla bla bla</div><div>blu blu GOLD blu</div>';
my $dom  = Mojo::DOM->new($html);

for my $e ( $dom->div->each ) {
    print $e->text if $e->text =~ /\bGOLD\b/;
}

Output: 输出:

blu blu GOLD blu

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM