简体   繁体   English

如何将整个字符串与单个正则表达式匹配为两种格式之一?

[英]How to match entire string to be one of two formats with a single regular expression?

I need to validate values that can have one of two formats and am trying to do so with a single regular expression but can't figure out why it doesn't work. 我需要验证可以具有两种格式之一的值,并且尝试使用单个正则表达式执行此操作,但无法弄清楚它为什么不起作用。

The first format is exactly 17 alphanumeric characters and the expression ^[A-Za-z0-9]{17}$ correctly matches the test value 5UXWX7C56BA123456 but not the shortened value 5UXWX7C56BA12345 or the lengthened value 5UXWX7C56BA1234569 . 第一种格式恰好是17个字母数字字符,表达式^[A-Za-z0-9]{17}$正确匹配测试值5UXWX7C56BA123456但不是缩短值5UXWX7C56BA12345或加长值5UXWX7C56BA1234569

The second format is exactly 8 alphanumeric characters followed by asterisk or underscore ansd two more alphanumeric characters. 第二种格式恰好是8个字母数字字符,后跟星号或下划线,另外还有两个字母数字字符。 The expression ^[A-Za-z0-9]{8}[*_][A-Za-z0-9]{2}$ correctly matches the test value 5UXWX7C5*BA but not the shortened value 5UXWX7C5*B or the lengthened value 5UXWX7C5*BA1 . 表达式^[A-Za-z0-9]{8}[*_][A-Za-z0-9]{2}$正确匹配测试值5UXWX7C5*BA但不是缩短值5UXWX7C5*B或者加长值5UXWX7C5*BA1

However when I try to combine the expressions I get unexpected results that differ, depending on which of the sub-expressions I place first. 但是,当我尝试组合表达式时,我会得到不同的意外结果,具体取决于我首先放置的子表达式。 The following snippet of code demonstrates 以下代码片段演示了

var pattern1 = new Regex(@"^([A-Za-z0-9]{17})|([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})$");
var pattern2 = new Regex(@"^([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})|([A-Za-z0-9]{17})$");

var values = new string[] 
{ 
    "5UXWX7C56BA12345", "5UXWX7C56BA123456", "5UXWX7C56BA1234569", 
    "5UXWX7C5*B", "5UXWX7C5*BA", "5UXWX7C5*BA1" 
};

Console.WriteLine($"Using {pattern1}\n");
Console.WriteLine($"  {"Value",-20}{"IsMatch",-9}{"Expected",-10}");
Console.WriteLine($"  {new string('-', 37)}");
values
    .Select(x => new { Value = x, Result = pattern1.IsMatch(x), ExpectedResult = x.Length == 11 || x.Length == 17 })
    .Select(x => $"  {x.Value,-20}{x.Result,-9}{x.ExpectedResult} {(x.Result == x.ExpectedResult ? "" : "UNEXPECTED")}")
    .WithEach(Console.WriteLine);

Console.WriteLine($"\n\nUsing {pattern2}\n");
Console.WriteLine($"  {"Value",-20}{"IsMatch",-9}{"Expected",-10}");
Console.WriteLine($"  {new string('-', 37)}");
values
    .Select(x => new { Value = x, Result = pattern2.IsMatch(x), ExpectedResult = x.Length == 11 || x.Length == 17 })
    .Select(x => $"  {x.Value,-20}{x.Result,-9}{x.ExpectedResult} {(x.Result == x.ExpectedResult ? "" : "UNEXPECTED")}")
    .WithEach(Console.WriteLine);

producing the following results 产生以下结果

Using ^([A-Za-z0-9]{17})|([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})$

  Value               IsMatch  Expected  
  -------------------------------------
  5UXWX7C56BA12345    False    False 
  5UXWX7C56BA123456   True     True 
  5UXWX7C56BA1234569  True     False UNEXPECTED
  5UXWX7C5*B          False    False 
  5UXWX7C5*BA         True     True 
  5UXWX7C5*BA1        False    False 


Using ^([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})|([A-Za-z0-9]{17})$

  Value               IsMatch  Expected  
  -------------------------------------
  5UXWX7C56BA12345    False    False 
  5UXWX7C56BA123456   True     True 
  5UXWX7C56BA1234569  True     False UNEXPECTED
  5UXWX7C5*B          False    False 
  5UXWX7C5*BA         True     True 
  5UXWX7C5*BA1        True     False UNEXPECTED

I hope someone will be able to point out the error in my expressions. 我希望有人能够在我的表达中指出错误。 It seems that although I am using ^ and $ to try and force the entire line/value to be matched, that somehow when longer a match is found even though there is a further unmatched character that I would have expected to cause the entire value not to match. 似乎虽然我正在使用^和$来尝试强制匹配整个行/值,但是当某个匹配被发现更长时,即使存在进一步不匹配的字符,我本来希望它会导致整个值不是匹配。

Although I used LINQPad to run the snippet above I see the same results from regex101.com . 虽然我使用LINQPad来运行上面的代码片段,但我看到了与regex101.com相同的结果。

Your regexps are not anchored correctly: 您的正则表达式未正确锚定:

^([A-Za-z0-9]{17})|([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})$
 ^               ^ ^                                ^                

Here, ([A-Za-z0-9]{17}) is only anchored at the start of the string (and there can be anything after that pattern) and ([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2}) is only anchored at the end of the string (and there can be anything before that pattern). 这里, ([A-Za-z0-9]{17})仅锚定在字符串的开头(并且在该模式之后可以有任何内容)和([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})仅锚定在字符串的末尾(并且在该模式之前可以有任何内容)。

The same problem is with the second pattern, you just swapped the alternatives. 同样的问题是第二种模式,你只是换了替代品。

Use 使用

var pattern1 = new Regex(@"^(?:[A-Za-z0-9]{17}|[A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})$");
                            ^                 ^                                ^

Otherwise, your alternatives are not anchored on both sides. 否则,你的选择是不固定在两侧

See the regex demo . 请参阅正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM