简体   繁体   English

使用Perl Regex提取第n次出现

[英]Extract nth occurrence with Perl Regex

I am trying to find the best way to parse a line that looks like this: 我试图找到解析这样一行的最佳方法:


Explicit|00|11|Hello World|12 3 134||and|blah|blah|blah

I just want to extract the stuff between the 6th and 7th vertical bar | 我只想提取第6和第7纵杆之间的东西
I tried something like 我试过类似的东西

if ($line =~ /^(.*\|){6}(\w*)\|/ ) {  
    print $2;  
}

The problem is that the first part seems to be matching the longest sequence possible because of .* , perhaps there is something different I should be using. 问题是第一部分似乎匹配可能的最长序列因为.* ,也许我应该使用不同的东西。 Between the vertical bars, there are alphanumeric characters, spaces and punctuation. 在垂直条之间,有字母数字字符,空格和标点符号。

Should I be matching the shortest between them? 我应该匹配它们之间的最短距离吗?

You can use .*? 你可以使用.*? instead, to modify the * to prefer fewer to more times. 相反,修改*以更喜欢更少次数。

This could still match in the wrong place if the field you want has non-word characters; 如果您想要的字段具有非单词字符,则仍可能在错误的位置匹配; to prevent this you can either explicitly say anything-but-| 为了防止这种情况你可以明确地说出任何东西 - 但是 - ( ([^|]*\\|){6} ) or disable backtracking for that part ( ((?>.*?\\|)){6} ). ([^|]*\\|){6} )或禁用该部分的回溯((?>.*?\\|)){6} )。

Or you could just use split: 或者您可以使用拆分:

if ( my $seventh = ( split /\|/, $line, 8 )[6] ) {
    print $seventh;
}

(the 8 is optional and tells split not to bother trying anymore after reaching the 7th |) (8是可选的,告诉分裂在到达第7个之后不再费心去尝试|)

Use split. 使用拆分。 Something like my @fields = split /\\|/, $str should work. my @fields = split /\\|/, $str应该my @fields = split /\\|/, $str Then you just index the field you're interested in (also empty fields will be preserved). 然后,您只需索引您感兴趣的字段(也将保留空字段)。 | | must be escaped as it's regexp operator. 必须转义,因为它是regexp运算符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM