简体   繁体   English

Perl正则表达式,获取两个字符串之间的字符串

[英]Perl Regex, get strings between two strings

I am new to Perl and trying to use Regex to get a piece of string between two tags that I know will be there in that string. 我是Perl的新手,并尝试使用Regex在两个我知道会在该字符串中的标签之间获取一个字符串。 I already tried various answers from stackoverflow but none of them seems to be working for me. 我已经尝试过stackoverflow的各种答案,但是似乎没有一个对我有用。 Here's my example... 这是我的例子

The required data is in $info variable out of which I want to get the useful data 所需的数据在$ info变量中,我想从中获取有用的数据

my $info = "random text i do not want\n|BIRTH PLACE=Boston, MA\n|more unwanted random text";

The Useful Data in the above string is Boston, MA . 上面字符串中的有用数据是Boston, MA I removed the newlines from the string by $info =~ s/\\n//g; 我通过$info =~ s/\\n//g;从字符串中删除了换行符$info =~ s/\\n//g; . Now $info has this string "random text i do not want|BIRTH PLACE=Boston, MA|more unwanted random text" . 现在$info具有此字符串"random text i do not want|BIRTH PLACE=Boston, MA|more unwanted random text" I thought doing this will help me capture the required data easily. 我认为这样做将有助于我轻松捕获所需的数据。

Please help me in getting the required data. 请帮助我获取所需的数据。 I am sure that the data will always be preceded by |BIRTH PLACE= and succeeded by | 我确信数据将始终在|BIRTH PLACE=之前,并在| . Everything before and after that is unwanted text. 在此之前和之后的所有内容都是不需要的文本。 If a question like this is already answered please guide me to it as well. 如果已经回答了这样的问题,请也指导我。 Thanks. 谢谢。

除了替换周围的所有内容,您还可以搜索/\\|BIRTH PLACE=([^\\|]+)\\n\\|/ ,[^ \\ |] + anything that is not a pipeanything that is not a pipe一项或多项。

$info =~ m{\|BIRTH PLACE=(.*?)\|} or die "There is no data in \$info?!";
my $birth_place = $1;

That should do the trick. 这应该够了吧。

You know, actually, those newlines might have helped you. 您知道,实际上,这些换行符可能对您有所帮助。 I would have gone for an initial regular expression of: 我本来会想要一个初始正则表达式:

/^\|BIRTH PLACE=(.*)$/m

Using the multiline modifer ( m ) to match ^ at the beginning of a line and $ at the end of it, instead of just matching at the beginning and end of the string. 使用多行修饰符( m )来匹配行首的^和末尾的$ ,而不仅仅是匹配字符串的首尾。 Heck, you can even get really crazy and match: 哎呀,你甚至可以变得非常疯狂并匹配:

/(?<=^\|BIRTH PLACE=).+$/m

To capture only the information you want, using lookbehind ( (?<= ... ) ) to assert that it's the birth place information. 若要仅捕获所需的信息,请使用后向( (?<= ... ) )断言这是出生地信息。

Why curse the string twice when you can do it once? 为什么一次只能诅咒两次?

So, in perl: 因此,在perl中:

if ($info =~ m/(?<=^\|BIRTH PLACE=).+$/m) {
    print "Born in $&.\n";
} else {
    print "From parts unknown";
}

You have presumably read this data from a file, which is a bad start. 您大概已经从文件中读取了此数据,这是一个糟糕的开始。 You program should look like this 您的程序应如下所示

use strict;
use warnings;

use autodie;

open my $fh, '<', 'myfile';

my $pob;
while (<$fh>) {
  if (/BIRTH PLACE=(.+)/) {
    $pob = $1;
    last;
  }
}

print $pob;

output 输出

Boston, MA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM