简体   繁体   English

用PHP进行正则表达式字符类减法

[英]Regex Character Class Subtraction with PHP

HI, HI,

I'm trying to match UK postcodes, using the pattern from http://interim.cabinetoffice.gov.uk/media/291370/bs7666-v2-0-xsd-PostCodeType.htm , 我正在尝试使用http://interim.cabinetoffice.gov.uk/media/291370/bs7666-v2-0-xsd-PostCodeType.htm中的模式匹配英国邮政编码,

/^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z-[CIKMOV]]{2}$/

I'm using this in PHP, but it doesn't match the valid postcode OL13 0EF . 我在PHP中使用它,但它与有效的邮政编码OL13 0EF不匹配。 This postcode does match, however, when I remove the -[CIKMOV] character class subtraction. 但是,当我删除-[CIKMOV]字符类减法时,此邮政编码确实匹配。

I get the impression that I'm doing character class subtraction wrong in PHP. 我得到的印象是我在PHP中做了字符减法错误。 I'd be most grateful if anyone could correct my error. 如果有人能纠正我的错误,我将非常感激。

Thanks in advance for your help. 在此先感谢您的帮助。

Ross 罗斯

Most of the regex flavours do not support character class subtraction. 大多数正则表达式都不支持字符类减法。 Instead you could use look-ahead assertion: 相反,你可以使用先行断言:

/^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9](?!.?[CIKMOV])[A-Z]{2}$/

If class subtraction is not supported, you should be able to use negative classes to achieve subtractions. 如果不支持类减法,则应该能够使用负类来实现减法。

Some examples are [^\\D] = \\d , [^[:^alpha:]] = [a-zA-Z] 一些例子是[^\\D] = \\d[^[:^alpha:]] = [a-zA-Z]

Your problem could be solved like that, using a negative POSIX character class inside a character class like [^az[:^alpha:]CIKMOV] 您的问题可以像这样解决,在[^az[:^alpha:]CIKMOV]等字符类中使用负POSIX字符类

[^
az # not az
[:^alpha:] # not not A-Za-z
CIKMOV # not C,I,K,M,O,V
]

Edit - This works too and might be easier to read: [^[:^alpha:][:lower:]CIKMOV] 编辑 - 这也有效,可能更容易阅读: [^[:^alpha:][:lower:]CIKMOV]

[^
[:^alpha:] # A-Za-z
[:lower:] # not az
CIKMOV # not C,I,K,M,O,V
]

The result is a character class that is AZ without C,I,K,M,O,V 结果是一个字符类,它是没有C,I,K,M,O,V的AZ
basically a subtraction. 基本上是一个减法。

Here is a test of 2 different class concoctions (in Perl): 这是对2种不同类混合的测试(在Perl中):

use strict;
use warnings;

my $match = '';

   # ANYOF[^\0-@CIKMOV[-\377!utf8::IsAlpha]
for (0 .. 255) {
   if (chr($_) =~ /^[^a-z[:^alpha:]CIKMOV]$/) {
       $match .= chr($_); next;
   }
   $match .= ' ';
}
$match =~ s/^ +//;
$match =~ s/ +$//;
print "'$match'\n";
$match = '';

   # ANYOF[^\0-@CIKMOV[-\377+utf8::IsDigit !utf8::IsWord]
for (0 .. 255) {
   if (chr($_) =~ /^[^a-z\d\W_CIKMOV]$/) {
       $match .= chr($_); next;
   }
   $match .= ' ';
}
$match =~ s/^ +//;
$match =~ s/ +$//;
print "'$match'\n";

Output shows the discontinuation in AZ minus CIKMOV, from tested ascii characters 0-255: 输出显示AZ中的停止减去CIKMOV,来自测试的ascii字符0-255:
'AB DEFGH JLN PQRSTU WXYZ'
'AB DEFGH JLN PQRSTU WXYZ'

PCRE does not support char class subtraction. PCRE不支持char类减法。

So you can enumerate all the uppercase letters except CIKMOV : 所以你可以枚举除CIKMOV之外的所有大写字母:

^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABDEFGHJLNPQRSTUWXYZ]{2}$

which can be shorted using range as: 可以使用范围缩短为:

^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABD-JLNP-UW-Z]{2}$

I think you're going to have to replace [AZ-[CIKMOV]] with [ABD-HJLNP-UW-Z] . 我想你将不得不用[ABD-HJLNP-UW-Z]取代[AZ-[CIKMOV]] [ABD-HJLNP-UW-Z] I don't think php supports character class substraction. 我不认为php支持字符类减法。 My alternative reads something like "A, B, D to H, J, L, N, P to U, and W to Z". 我的替代方案是“A,B,D到H,J,L,N,P到U,W到Z”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM