简体   繁体   中英

Perl unite 2 regexps into 1

A valid string should either consist of Cyrillic characters or Latin characters only.

I created a working solution with 2 regexps. But when I try to unite them into 1, it fails:

#!/usr/bin/perl

use strict;
use warnings;
use utf8;
use v5.14;
use open ':std', ':encoding(UTF-8)';

my @data = (
    # rus - ok
    "абвгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
    # space
    "а бвгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
    # rus - latin
    "аtбвгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
    # digit
    "аб2вгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
    # latin - ok
    "abcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ",
    # space
    "a bcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ",
    # underscore
    "a_bcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ",
    # digit
    "a2bcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ"
);

foreach(@data) {
    if ($_ =~ /^[а-яё]+$/i or $_ =~ /^[a-z]+$/i) {
        print "ok\n";
    }
    else {
        print "bad\n";
    }
}

print "-------\n";
foreach(@data) {
    if ($_ =~ /^(:?[а-яё]+)|(:?[a-z]+)$/i) {
        print "ok\n";
    }
    else {
        print "bad\n";
    }
}

Output:

ok
bad
bad
bad
ok
bad
bad
bad
-------
ok
ok
ok
ok
ok
ok
ok
ok

For some reason the second regexp always succeeds.

In your regex,

  • :? - matches an optional : while you wanted to define a non-capturing group , (?:...)
  • ^(?:a+)|(?:b+)$ - matches either a s at the start of the string OR b s at the end of the string.

You should use

/^(?:[а-яё]+|[a-z]+)$/i

See the regex demo . Details :

  • ^ - start of string
  • (?: - start of a non-capturing group
    • [а-яё]+ - one or more Russian letters
    • | - or
    • [az]+ - one or more ASCII letters
  • ) - end of the non-capturing group
  • $ - end of string.

NOTE : Starting from Perl 5.22, you may use the n modifier to make capturing groups behave as non-capturing, /^([а-яё]+|[az]+)$/ni . So, there could be no risk of mixing ?: and :? .

Check the core version with use v5.22.0; in this case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM