简体   繁体   中英

perl regex match n digits, but only if they are not all the same

Using a Perl regex, I need to match a series of eight digits, for example, 12345678, but only if they are not all the same. 00000000 and 99999999 are typical patterns that should not match. I'm trying to weed out obviously invalid values from existing database records.

I've got this:

my ($match) = /(\d{8})/;

But I can't quite get the backref arranged right.

How about:

^(\d)(?!\1{7})\d{7}$

This will match 8 digit number that haven't 8 same digit.

Sample code:

my $re = qr/^(\d)(?!\1{7})\d{7}$/;
while(<DATA>) {
    chomp;
    say (/$re/ ? "OK : $_" : "KO : $_");
}

__DATA__
12345678
12345123
123456
11111111

Output:

OK : 12345678
OK : 12345123
KO : 123456
KO : 11111111

Explanation:

The regular expression:

(?-imsx:^(\d)(?!\1{7})\d{7}$)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \d                       digits (0-9)
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    \1{7}                    what was matched by capture \1 (7 times)
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  \d{7}                    digits (0-9) (7 times)
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

I'd do this in two regular expressions. One to match what you are looking for, and one to filter what you're not.

Inspired by HamZa's answer though, I've also provided a single regex solution.

use strict;
use warnings;

while (my $num = <DATA>) {
    chomp $num;

    # Single Regex Solution - Inspired by HamZa's code
    if ($num =~ /^.*(\d).*\1.*$(*SKIP)(*FAIL)|^\d{8}$/) {
        print "Yes - ";
    } else {
        print "No  - ";
    }

    # Two Regex Solution
    if ($num =~ /^\d{8}$/ && $num !~ /(\d).*\1/) {
        print "Yes - ";
    } else {
        print "No  - ";
    }

    print "$num\n";
}

__DATA__
12345678
12345674
00001111
00000000
99999999
87654321
87654351
123456789

And the results?

Yes - Yes - 12345678
No  - No  - 12345674
No  - No  - 00001111
No  - No  - 00000000
No  - No  - 99999999
Yes - Yes - 87654321
No  - No  - 87654351
No  - No  - 123456789

This answer is based on the title of the question: match n digits, but only if they are not all the same .


So I've come with the following expression:

(\d)\1+\b(*SKIP)(*FAIL)|\d+

What does this mean ?

(\d)                # Match a digit and put it in group 1
\1+                 # Match what was matched in group 1 and repeat it one or more times
\b                  # Word boundary, we could use (?!\d) to be more specific
(*SKIP)(*FAIL)      # Skip & fail, we use this to exclude what we just have matched
|                   # Or
\d+                 # Match a digit one or more times

The advantage of this regex is that you don't need to edit it each time you want to change n . Of course, if you want to match only n digits, you could just replace the last alternation \\d+ with \\d{n}\\b .

Online demo

SKIP/FAIL reference

my $number = "99999999";                # look for first digit, capture,
print "ok\n" if $number =~ /(\d)\1{7}/; # use \1{7} to determine 7 matches of captured digit

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM