简体   繁体   中英

perl regex to capture into variable only an exact match within a string

i need help with this regex to capture only the exact match within a string and put it in a variable

i only want to extrapolate these values (fixed list; no other numbers):

004010H222A1 or 
004010H223A2 or 
004010H220A1 or 
004010H279A1 or 
004010H279A1 or 
004010H217 

from the string given

example:

$str = "this is the code 004010H222A1 the rest is irrelevant";
$str = "the random number is 004010H223A2 ** anything else is irrelevant";
$str = "the last lottery number 004010H220A1 ~~ the rest is irrelevant";
$str = "yet another random sentence 004010H279A1 the rest is irrelevant";
$str = "any sentence before what i want 004010H279A1 the rest is irrelevant";
$str = "last winning number 004010H217~~~";


if ($str =~ /\b(004010H[2][1|2|7][0|2|3|7|9])(A[1|2])?\b/){
print "found exact match\n";
##put result into a variable
##example:
## $exact_match = <found eg 004010H222A1>; 
##print $exact_match;
}

how can i capture the exact match of what i want into a variable then display it? maybe i just can't see the forest for the trees. thank you in advance for your help

With a given list of patterns

my @fixed = qw(004010H222A1 004010H223A2 004010H220A1 
    004010H279A1 004010H279A1 004010H217);

my $str = "this is the code 004010H222A1 the rest is irrelevant";

my @found = grep { $str =~ /$_/ } @fixed;

what matches all such patterns in the string. Note that you may need word boundaries ( /\\b$_\\b/ ), albeit not if the patterns are so distinct in the surrounding text as shown. If the pattern itself contains any non-word characters then you'd need to build the sub-pattern for the "boundary."

If you are certain there is only one of them in the string or need only the first one

my ($found) = grep { $str =~ /$_/ } @fixed;

or by constructing the pattern with alternation first

my $re = join '|', map { quotemeta } @fixed;

my $found = $str =~ /$re/;  # consider using word-boudaries /\b$re\b/

This may be more efficient since it starts the regex engine only once, but on the other hand with only a few (or a single one?) options we do engage in all that overhead to form the alternation.

Depending on details you may want to sort by length first, either by longest or shortest

my $re = join '|', map { quotemeta } sort { length $a <=> lenght $b } @fixed;
...

See this post for discussion of reasoning behind these options.


If you have more possibilities, with the exact pattern shown in the question, the pattern is: digits followed by letters-or-digits, terminated by non-letter-digits.

my $pattern = qr/([0-9]+[a-zA-Z0-9]+)[^a-zA-Z0-9]/;

my ($found) = $str =~ /$pattern/;

The above matches if the pattern is immediately preceded by a non-digit character (like ~ ), not only space. It also allows low-case letters, drop az if they cannot be there. You can further restrict this if it is certain that it has leading zeros.

Just to put my two cents in:

\b004010H2[127][02379](?:A[12])?\b
# \b - match a word boundary
# match 004010H2 literally
# [127] one of 1,2 or 7
# followed by one of 0,2,3,7 or 9
# (?:....)? is a non capturing group and optional in this case

Hint: Obviously, this is able to match your numbers but other combinations like 004010H210A2 as well. It totally depends on your input strings. If you only have these six alternatives, you're probably on the safer side with simple string functions.
See a demo on regex101.com .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM