简体   繁体   English

使用正则表达式检测单词后跟点或空格

[英]Detecting a word followed by a dot or whitespace using regex

I am using regex and C# to find occurrences of a particular word using 我使用regexC#来查找特定单词的出现次数

Regex regex = new Regex(@"\b" + word + @"\b");

How can I modify my Regex to only detect the word if it is either preceded with a whitespace, followed with a whitespace or followed with a dot? 我如何修改我的正则表达式只检测单词,如果它前面有一个空格,后跟一个空格或后跟一个点?

Examples: 例子:

this.Button.Value - should match this.value - should match this.Button.Value - 应该匹配this.value - 应该匹配

document.thisButton.Value - should not match document.thisButton.Value - 不应该匹配

Regex regex = new Regex(@"((?<=( \.))" + word + @"\b)" + "|" + @"(\b" + word + @"[ .])");

However, note that this could cause trouble if word contains characters that have special meanings in Regular Expressions. 但请注意,如果word包含在正则表达式中具有特殊含义的字符,则可能会导致问题。 I'm assuming that word contains alpha-numeric characters only. 我假设该word仅包含字母数字字符。

如果我正确理解你:

Regex regex = new Regex(@"\b" + (word " " || ".") + @"\b");

You may use lookarounds and alternation to check for the 2 possibilities when a keyword is enclosed with spaces or is just followed with a dot: 当关键字用空格括起或者后跟一个点时,您可以使用外观和替换来检查两种可能性:

var line = "this.Button.Value\nthis.value\ndocument.thisButton.Value";
var word = "this";
var rx =new Regex(string.Format(@"(?<=\s)\b{0}\b(?=\s)|\b{0}\b(?=\.)", word));
var result = rx.Replace(line, "NEW_WORD");
Console.WriteLine(result);

See IDEONE demo and a regex demo . 请参阅IDEONE演示正则表达式演示

The pattern matches: 模式匹配:

  • (?<=\\s)\\bthis\\b(?=\\s) - whole word "this" that is preceded with whitespace (?<=\\s) and that is followed with whitespace (?=\\s) (?<=\\s)\\bthis\\b(?=\\s) - 整个单词“this”前面有空格(?<=\\s) ,后跟空格(?=\\s)
  • | - or - 要么
  • \\bthis\\b(?=\\.) - whole word "this" that is followed with a literal . \\bthis\\b(?=\\.) - 整个单词“this”后跟一个文字. ( (?=\\.) ) (?=\\.)

Since lookarounds are not consuming characters (the regex index remains where it was) the characters matched with them are not placed in the match value, and are thus untouched during the replacement. 由于lookarounds不消耗字符(正则表达式索引保持原样),与它们匹配的字符不会放在匹配值中,因此在替换期间不会受到影响。

The (?<=...) match group checks for preceding and (?=...) checks for following, both without including them in the match. (?<=...)匹配组检查前和(?=...)检查以下,既没有将它们包括在匹配。

Regex regex = new Regex(@"(?<=\s)\b" + word + @"\b|\b" + word + @"\b(?=[\s\.])");

EDIT: Pattern updated. 编辑:模式更新。

EDIT 2: Online test: http://ideone.com/RXRQM5 编辑2:在线测试: http//ideone.com/RXRQM5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM