简体   繁体   English

perl regex 仅将字符串中的完全匹配捕获到变量中

[英]perl regex to capture into variable only an exact match within a string

i need help with this regex to capture only the exact match within a string and put it in a variable我需要此正则表达式的帮助以仅捕获字符串中的完全匹配项并将其放入变量中

i only want to extrapolate these values (fixed list; no other numbers):我只想推断这些值(固定列表;没有其他数字):

004010H222A1 or 
004010H223A2 or 
004010H220A1 or 
004010H279A1 or 
004010H279A1 or 
004010H217 

from the string given从给定的字符串

example:例子:

$str = "this is the code 004010H222A1 the rest is irrelevant";
$str = "the random number is 004010H223A2 ** anything else is irrelevant";
$str = "the last lottery number 004010H220A1 ~~ the rest is irrelevant";
$str = "yet another random sentence 004010H279A1 the rest is irrelevant";
$str = "any sentence before what i want 004010H279A1 the rest is irrelevant";
$str = "last winning number 004010H217~~~";


if ($str =~ /\b(004010H[2][1|2|7][0|2|3|7|9])(A[1|2])?\b/){
print "found exact match\n";
##put result into a variable
##example:
## $exact_match = <found eg 004010H222A1>; 
##print $exact_match;
}

how can i capture the exact match of what i want into a variable then display it?我怎样才能将我想要的精确匹配捕获到一个变量中然后显示它? maybe i just can't see the forest for the trees.也许我只是见树不见林。 thank you in advance for your help预先感谢您的帮助

With a given list of patterns使用给定的模式列表

my @fixed = qw(004010H222A1 004010H223A2 004010H220A1 
    004010H279A1 004010H279A1 004010H217);

my $str = "this is the code 004010H222A1 the rest is irrelevant";

my @found = grep { $str =~ /$_/ } @fixed;

what matches all such patterns in the string. what 匹配字符串中的所有这些模式。 Note that you may need word boundaries ( /\\b$_\\b/ ), albeit not if the patterns are so distinct in the surrounding text as shown.请注意,您可能需要单词边界 ( /\\b$_\\b/ ),但如果周围文本中的模式如图所示如此不同,则不需要。 If the pattern itself contains any non-word characters then you'd need to build the sub-pattern for the "boundary."如果模式本身包含任何非单词字符,那么您需要为“边界”构建子模式。

If you are certain there is only one of them in the string or need only the first one如果您确定字符串中只有其中一个或只需要第一个

my ($found) = grep { $str =~ /$_/ } @fixed;

or by constructing the pattern with alternation first或先构造交替模式

my $re = join '|', map { quotemeta } @fixed;

my $found = $str =~ /$re/;  # consider using word-boudaries /\b$re\b/

This may be more efficient since it starts the regex engine only once, but on the other hand with only a few (or a single one?) options we do engage in all that overhead to form the alternation.这可能更有效,因为它只启动正则表达式引擎一次,但另一方面,只有少数(或一个?)选项,我们确实参与了所有开销以形成交替。

Depending on details you may want to sort by length first, either by longest or shortest根据详细信息,您可能希望先按length排序,按最长或最短

my $re = join '|', map { quotemeta } sort { length $a <=> lenght $b } @fixed;
...

See this post for discussion of reasoning behind these options.有关这些选项背后的推理的讨论,请参阅此帖子


If you have more possibilities, with the exact pattern shown in the question, the pattern is: digits followed by letters-or-digits, terminated by non-letter-digits.如果您有更多的可能性,使用问题中显示的确切模式,模式是:数字后跟字母或数字,以非字母数字结尾。

my $pattern = qr/([0-9]+[a-zA-Z0-9]+)[^a-zA-Z0-9]/;

my ($found) = $str =~ /$pattern/;

The above matches if the pattern is immediately preceded by a non-digit character (like ~ ), not only space.如果模式前面紧跟一个非数字字符(如~ ),而不仅仅是空格,则上述匹配。 It also allows low-case letters, drop az if they cannot be there.它还允许使用小写字母,如果它们不存在,则删除az You can further restrict this if it is certain that it has leading zeros.如果确定它有前导零,您可以进一步限制它。

Just to put my two cents in:只是把我的两分钱放进去:

\b004010H2[127][02379](?:A[12])?\b
# \b - match a word boundary
# match 004010H2 literally
# [127] one of 1,2 or 7
# followed by one of 0,2,3,7 or 9
# (?:....)? is a non capturing group and optional in this case

Hint: Obviously, this is able to match your numbers but other combinations like 004010H210A2 as well.提示:显然,这可以匹配您的号码,但也可以匹配其他组合,例如004010H210A2 It totally depends on your input strings.这完全取决于您的输入字符串。 If you only have these six alternatives, you're probably on the safer side with simple string functions.如果您只有这六个选项,那么使用简单的字符串函数可能更安全。
See a demo on regex101.com .在 regex101.com 上查看演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM