[英]Perl RegEx to get substring of word found between two tags
I have a question related to regex. 我有一个与正则表达式相关的问题。 I have an element as
$str1 = <strong>average_speed_answer_good_high</strong>
What I am trying to do is to get the string before "_good_high"
(which in this case is "average_speed_answer"
) in a variable $sub_str1
in one variable and "good_high"
in a variable $sub_str2
. 我有一个元素
$str1 = <strong>average_speed_answer_good_high</strong>
我要做的是在一个变量中的变量$sub_str1
中获取"_good_high"
之前的字符串(在本例中为"average_speed_answer"
)变量$sub_str2
中的"good_high"
。
Here "_good_high"
is the only constant part of the string and the rest can change. 这里
"_good_high"
是字符串中唯一不变的部分,其余部分可以改变。 Even after "_good_high"
, there could be some characters before "</strong>"
. 即使在
"_good_high"
, "</strong>"
之前也可能会出现一些字符。 Can I get some tips on how I can do this? 我可以获得一些关于如何做到这一点的提示吗?
Until now, I was able to do something like: 到现在为止,我能够做到这样的事情:
if ( $str1 =~ m{(<strong>)(.*?)(</strong>)} ) {
$sub_str1 = $2; #which gives average_speed_answer_good_high
}
I have tried some combinations like, 我试过一些组合,比如
(<strong>)(?=_good_high)(</strong>)
(<strong>)(?<=_good_high)(</strong>)
(<strong>)((?<=_good_high)\w+)(</strong>) #tried $2 and $3
(<strong>)(?<=_good_high)\w+(</strong>)
(<strong>)((?<=(_good_high))\w+)(</strong>)#tried $2, $3 and $4
but they all put blank in $sub_str1
. 但他们都在
$sub_str1
留空了。
I would appreciate any help or tips. 我将不胜感激任何帮助或提示。
You need to specify _good_high
before the closing strong tag. 您需要在结束强标记之前指定
_good_high
。
if ( $str1 =~ m{(<strong>)(.*?)_good_high.*?(</strong>)} ) {
$sub_str1 = $2;
}
or 要么
if ( $str1 =~ m{<strong>(.*?)_good_high.*?</strong>} ) {
$sub_str1 = $1;
}
怎么样:
($sub_str1) = $str1 =~ m{<strong>(.*?)_good_high</strong>};
Don't get too hung up on regexes and capture groups. 不要太依赖正则表达式和捕获组。 They're not the only tool in your box.
它们不是你盒子里唯一的工具。
For example: 例如:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $str1 = '<strong>average_speed_answer_good_high</strong>';
if ( my ($sub_str1) = $str1 =~ m{<strong>(.*?)</strong>} ) {
print "Substr: $sub_str1\n";
my @split_str = split ( /_/, $sub_str1 );
print Dumper \@split_str;
print "Extracted: ",join ( "_", (split ( /_/, $sub_str1 ))[0..2] ),"\n";
}
We extract the substring as before - but then we split it using _
: 我们像以前一样提取子字符串 - 但之后我们使用
_
分割它:
$VAR1 = [
'average',
'speed',
'answer',
'good',
'high'
];
And then stick it together again, preserving elements 0
to 2
to get your answer. 然后将它再次粘在一起,保留元素
0
到2
以获得答案。
Your problems seem to result from your understanding of the functioning of (
, )
, ?
您的问题似乎是由于您对
(
, )
的运作有所了解而产生的?
, .*
, and .*
. ,
.*
和.*
。
In your second-part examples, there is no variable part, only grouping, sometimes without capturing. 在您的第二部分示例中,没有可变部分,只有分组,有时没有捕获。
pre(.*)post
causes capturing up all between pre
and post
in $1
pre(.*)post
导致在$1
pre
和post
捕获所有内容 pre(?:a|b|c)post
causes grouping of alternatives without capturing pre(?:a|b|c)post
会导致备选方案的分组而不会被捕获 a(.*?)b
causes non-greedy matching (+capturing): matching x
instead xby
in axbyb
a(.*?)b
使非贪婪匹配 (+捕获):匹配x
代替xby
在axbyb
I think the best way is as follows. 我认为最好的方法如下。 Just look for all text except angle brackets that is preceded by a
<strong>
tag (there's no need to search for the end tag) followed by _good_high
. 只需查找除了尖括号之外的所有文本,前面带有
<strong>
标记(不需要搜索结束标记),然后是_good_high
。 That is the wanted substring 那是想要的子串
use strict;
use warnings;
my $s = <<END;
<html>
<body>
<strong>average_speed_answer_good_high</strong>
</body>
</html>
END
if ( my ($text) = $s =~ /<strong>([^<>]+)_good_high/ ) {
print $text, "\n";
}
average_speed_answer
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.