Perl RegEx获取两个标签之间的字的子字符串

Question

I have a question related to regex. 我有一个与正则表达式相关的问题。 I have an element as $str1 = average_speed_answer_good_high What I am trying to do is to get the string before "_good_high" (which in this case is "average_speed_answer" ) in a variable $sub_str1 in one variable and "good_high" in a variable $sub_str2 . 我有一个元素$str1 = average_speed_answer_good_high我要做的是在一个变量中的变量$sub_str1中获取"_good_high"之前的字符串（在本例中为"average_speed_answer" ）变量$sub_str2中的"good_high" 。

Here "_good_high" is the only constant part of the string and the rest can change. 这里"_good_high"是字符串中唯一不变的部分，其余部分可以改变。 Even after "_good_high" , there could be some characters before "" . 即使在"_good_high" ， ""之前也可能会出现一些字符。 Can I get some tips on how I can do this? 我可以获得一些关于如何做到这一点的提示吗？

Until now, I was able to do something like: 到现在为止，我能够做到这样的事情：

if ( $str1 =~ m{(<strong>)(.*?)(</strong>)} ) {
    $sub_str1 = $2; #which gives average_speed_answer_good_high
}

I have tried some combinations like, 我试过一些组合，比如

(<strong>)(?=_good_high)(</strong>) 
(<strong>)(?<=_good_high)(</strong>) 
(<strong>)((?<=_good_high)\w+)(</strong>) #tried $2 and $3
(<strong>)(?<=_good_high)\w+(</strong>) 
(<strong>)((?<=(_good_high))\w+)(</strong>)#tried $2, $3 and $4

but they all put blank in $sub_str1 . 但他们都在$sub_str1留空了。

I would appreciate any help or tips. 我将不胜感激任何帮助或提示。

Answer 1

You need to specify _good_high before the closing strong tag. 您需要在结束强标记之前指定_good_high 。

if ( $str1 =~ m{(<strong>)(.*?)_good_high.*?(</strong>)} ) {
    $sub_str1 = $2; 
}

or 要么

if ( $str1 =~ m{<strong>(.*?)_good_high.*?</strong>} ) {
    $sub_str1 = $1; 
}

Answer 2

怎么样：

($sub_str1) = $str1 =~ m{<strong>(.*?)_good_high</strong>};

Answer 3

Don't get too hung up on regexes and capture groups. 不要太依赖正则表达式和捕获组。 They're not the only tool in your box. 它们不是你盒子里唯一的工具。

For example: 例如：

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $str1 = '<strong>average_speed_answer_good_high</strong>';
if ( my ($sub_str1) = $str1 =~ m{<strong>(.*?)</strong>} ) {
    print "Substr: $sub_str1\n";
    my @split_str = split ( /_/, $sub_str1 );
    print Dumper \@split_str; 
    print "Extracted: ",join ( "_", (split ( /_/, $sub_str1 ))[0..2] ),"\n";
}

We extract the substring as before - but then we split it using _ : 我们像以前一样提取子字符串 - 但之后我们使用_分割它：

$VAR1 = [
          'average',
          'speed',
          'answer',
          'good',
          'high'
        ];

And then stick it together again, preserving elements 0 to 2 to get your answer. 然后将它再次粘在一起，保留元素0到2以获得答案。

Answer 4

Your problems seem to result from your understanding of the functioning of ( , ) , ? 您的问题似乎是由于您对( ， )的运作有所了解而产生的? , .* , and .* . ， .*和.* 。

In your second-part examples, there is no variable part, only grouping, sometimes without capturing. 在您的第二部分示例中，没有可变部分，只有分组，有时没有捕获。

pre(.*)post causes capturing up all between pre and post in $1 pre(.*)post导致在$1 pre和post 捕获所有内容
pre(?:a|b|c)post causes grouping of alternatives without capturing pre(?:a|b|c)post会导致备选方案的分组而不会被捕获
a(.*?)b causes non-greedy matching (+capturing): matching x instead xby in axbyb a(.*?)b使非贪婪匹配 （+捕获）：匹配x代替xby在axbyb

Answer 5

I think the best way is as follows. 我认为最好的方法如下。 Just look for all text except angle brackets that is preceded by a  tag (there's no need to search for the end tag) followed by _good_high . 只需查找除了尖括号之外的所有文本，前面带有标记（不需要搜索结束标记），然后是_good_high 。 That is the wanted substring 那是想要的子串

use strict;
use warnings;

my $s = <<END;
<html>
  <body>
    <strong>average_speed_answer_good_high</strong>
  </body>
</html>
END

if ( my ($text) = $s =~ /<strong>([^<>]+)_good_high/ ) {
    print $text, "\n";
}

output 产量

average_speed_answer

Perl RegEx获取两个标签之间的字的子字符串

问题描述

5 个解决方案

解决方案1
1 已采纳 2015-09-25 08:27:51

解决方案2
1 2015-09-25 08:29:08

解决方案3
1 2015-09-25 09:30:43

解决方案4
0 2015-09-25 08:54:48

解决方案5
0 2015-09-25 13:44:53

output 产量

Perl RegEx获取两个标签之间的字的子字符串

问题描述

5 个解决方案

解决方案1 1 已采纳 2015-09-25 08:27:51

解决方案2 1 2015-09-25 08:29:08

解决方案3 1 2015-09-25 09:30:43

解决方案4 0 2015-09-25 08:54:48

解决方案5 0 2015-09-25 13:44:53

output 产量

解决方案1
1 已采纳 2015-09-25 08:27:51

解决方案2
1 2015-09-25 08:29:08

解决方案3
1 2015-09-25 09:30:43

解决方案4
0 2015-09-25 08:54:48

解决方案5
0 2015-09-25 13:44:53