简体   繁体   English

正则表达式检测单词是否是字符串的一部分并且下一个单词是否大写

[英]Regex to detect if words are part of the string and next word is not capitalized

I am looking for a regex that would match a specific word or words which are part of the given string with a restriction - if there is a word after pattern this word should not be capitalized.我正在寻找一个正则表达式,它可以匹配作为给定字符串一部分的特定单词或带有限制的单词 - 如果在模式之后有一个单词,则该单词不应大写。 Let's assume that the words are ' Base Case ', so here are some examples假设这些词是“基本情况”,所以这里有一些例子

  • Final Base Case - should match Final Base Case - 应该匹配
  • Final Base Case financial - should match Final Base Case financial - 应该匹配
  • Final Base Case Financial - should not match (the next word 'Financial' is capitalized) Final Base Case Financial - 不应匹配(下一个单词“Financial”大写)
  • Final Base Cases - should not match ('Case' and 'Cases' are not matched) Final Base Cases - 不应匹配(“案例”和“案例”不匹配)

I use the following regex to determine if my word/words are part of the string我使用以下正则表达式来确定我的单词/单词是否是字符串的一部分

\bBase Case(?!\w)

Can someone please help me modify my regex expression to add restriction for the next capitalized word?有人可以帮我修改我的正则表达式以添加对下一个大写单词的限制吗?

You need to check for two cases after the search pattern:您需要在搜索模式之后检查两种情况:

  1. end of string ( $ );字符串结尾( $ ); or或者
  2. another word, which doesn't begin with a capital letter ( \s+[^AZ\s] )另一个不以大写字母开头的单词 ( \s+[^AZ\s] )

You can do that with this regex:你可以用这个正则表达式做到这一点:

\bBase Case(?=$|\s+[^A-Z\s])

Note that since the second half of the lookahead asserts a space before the next word, it prevents Base Cases or similar from matching.请注意,由于前瞻的后半部分在下一个单词之前声明了一个空格,因此它会阻止Base Cases或类似的匹配。

Demo on regex101正则表达式 101 上的演示

In a most generic case, you may use在最一般的情况下,您可以使用

\bBase\s+Case\b(?!\W*\p{Lu})

See the regex demo .请参阅正则表达式演示

Details细节

  • \b - a word boundary \b - 单词边界
  • Base\s+Case - Base , 1+ whitespaces, Case Base\s+Case - Base , 1+ 空格, Case
  • \b - a word boundary \b - 单词边界
  • (?!\W*\p{Lu}) - a negative lookahead that fails the match if there are 0 or more non-word chars followed with any Unicode uppercase letter immediately to the right of the current location. (?!\W*\p{Lu}) - 如果有 0 个或多个非单词字符后跟紧跟当前位置右侧的任何 Unicode 大写字母,则匹配失败。

If there is only whitespace expected between the word and the uppercase letter, replace \W with \s .如果单词和大写字母之间只有空格,请将\W替换为\s

C# usage: C# 用法:

var results = Regex.Matches(text, @"\bBase\s+Case\b(?!\W*\p{Lu})")
    .Cast<Match>()
    .Select(m => m.Value)
    .ToList();

Or, just to check if it exists in each string:或者,只是检查它是否存在于每个字符串中:

var texts = new List<string> {"Final Base Case", "Final Base Case financial", "Final Base Case Financial", "Final Base Cases"};
foreach (var text in texts) {
    Console.WriteLine("{0}: {1}", text, Regex.IsMatch(text, @"\bBase\s+Case\b(?!\W*\p{Lu})"));
}

See C# demo .请参阅C# 演示 Output: Output:

Final Base Case: True
Final Base Case financial: True
Final Base Case Financial: False
Final Base Cases: False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM