简体   繁体   中英

2-step regular expression matching with a variable in Perl

I am looking to do a 2-step regular expression look-up in Perl, I have text that looks like this:

here is some text 9337 more text AA 2214 and some 1190 more BB stuff 8790 words

I also have a hash with the following values:

%my_hash = ( 9337 => 'AA', 2214 => 'BB', 8790 => 'CC' );

Here's what I need to do:

  1. Find a number
  2. Look up the text code for the number using my_hash
  3. Check if the text code appears within 50 characters of the identified number, and if true print the result

So the output I'm looking for is:

Found 9337, matches 'AA'
Found 2214, matches 'BB'
Found 1190, no matches
Found 8790, no matches

Here's what I have so far:

while ( $text =~ /(\d+)(.{1,50})/g ) {
  $num = $1;
  $text_after_num = $2;
  $search_for = $my_hash{$num};
  if ( $text_after_num =~ /($search_for)/ ) {
    print "Found $num, matches $search_for\n";
  }
  else {
   print "Found $num, no matches\n";
  }

This sort of works, except that the only correct match is 9337; the code doesn't match 2214. I think the reason is that the regular expression match on 9337 is including 50 characters after the number for the second-step match, and then when the regex engine starts again it is starting from a point after the 2214. Is there an easy way to fix this? I think the \\G modifier can help me here, but I don't quite see how.

Any suggestions or help would be great.

You have a problem with greediness. The 1,50 will consume as much as it can. Your regex should be /(\\d+)(.+?)(?=($|\\d))/

To explain, the question mark will make the multiple match non-greedy (it will stop as soon as the next pattern is matched - the next pattern gets precedence). The ?= is a lookahead operator to say "check if the next element is a digit. If so, match but do not consume." This allows the first digit to get picked up by the beginning of the regex and be put into the next matched pattern.

[EDIT] I added an optional end value to the lookahead so that it wouldn't die on the last match.

Just use :

/\b\d+\b/g

Why match everything if you don't need to? You should use other functions to determine where the number is :

/(?=9337.{1,50}AA)/

This will fail if AA is further than 50 chars away from the end of 9337. Of course you will have to interpolate your variables to match your hashe's keys and values. This was just an example for your first key/value pair.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM