[英]Perl Regex, get strings between two strings
I am new to Perl and trying to use Regex to get a piece of string between two tags that I know will be there in that string. 我是Perl的新手,并尝试使用Regex在两个我知道会在该字符串中的标签之间获取一个字符串。 I already tried various answers from stackoverflow but none of them seems to be working for me.
我已经尝试过stackoverflow的各种答案,但是似乎没有一个对我有用。 Here's my example...
这是我的例子
The required data is in $info variable out of which I want to get the useful data 所需的数据在$ info变量中,我想从中获取有用的数据
my $info = "random text i do not want\n|BIRTH PLACE=Boston, MA\n|more unwanted random text";
The Useful Data in the above string is Boston, MA
. 上面字符串中的有用数据是
Boston, MA
。 I removed the newlines from the string by $info =~ s/\\n//g;
我通过
$info =~ s/\\n//g;
从字符串中删除了换行符$info =~ s/\\n//g;
. 。 Now
$info
has this string "random text i do not want|BIRTH PLACE=Boston, MA|more unwanted random text"
. 现在
$info
具有此字符串"random text i do not want|BIRTH PLACE=Boston, MA|more unwanted random text"
。 I thought doing this will help me capture the required data easily. 我认为这样做将有助于我轻松捕获所需的数据。
Please help me in getting the required data. 请帮助我获取所需的数据。 I am sure that the data will always be preceded by
|BIRTH PLACE=
and succeeded by |
我确信数据将始终在
|BIRTH PLACE=
之前,并在|
. 。 Everything before and after that is unwanted text.
在此之前和之后的所有内容都是不需要的文本。 If a question like this is already answered please guide me to it as well.
如果已经回答了这样的问题,请也指导我。 Thanks.
谢谢。
除了替换周围的所有内容,您还可以搜索/\\|BIRTH PLACE=([^\\|]+)\\n\\|/
,[^ \\ |] + anything that is not a pipe
的anything that is not a pipe
一项或多项。
$info =~ m{\|BIRTH PLACE=(.*?)\|} or die "There is no data in \$info?!";
my $birth_place = $1;
That should do the trick. 这应该够了吧。
You know, actually, those newlines might have helped you. 您知道,实际上,这些换行符可能对您有所帮助。 I would have gone for an initial regular expression of:
我本来会想要一个初始正则表达式:
/^\|BIRTH PLACE=(.*)$/m
Using the multiline modifer ( m
) to match ^
at the beginning of a line and $
at the end of it, instead of just matching at the beginning and end of the string. 使用多行修饰符(
m
)来匹配行首的^
和末尾的$
,而不仅仅是匹配字符串的首尾。 Heck, you can even get really crazy and match: 哎呀,你甚至可以变得非常疯狂并匹配:
/(?<=^\|BIRTH PLACE=).+$/m
To capture only the information you want, using lookbehind ( (?<= ... )
) to assert that it's the birth place information. 若要仅捕获所需的信息,请使用后向(
(?<= ... )
)断言这是出生地信息。
Why curse the string twice when you can do it once? 为什么一次只能诅咒两次?
So, in perl: 因此,在perl中:
if ($info =~ m/(?<=^\|BIRTH PLACE=).+$/m) {
print "Born in $&.\n";
} else {
print "From parts unknown";
}
You have presumably read this data from a file, which is a bad start. 您大概已经从文件中读取了此数据,这是一个糟糕的开始。 You program should look like this
您的程序应如下所示
use strict;
use warnings;
use autodie;
open my $fh, '<', 'myfile';
my $pob;
while (<$fh>) {
if (/BIRTH PLACE=(.+)/) {
$pob = $1;
last;
}
}
print $pob;
output 输出
Boston, MA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.