C＃正则表达式匹配整个单词，带有特殊字符

Question

I have searched through some questions but couldn't find the exact answer i am looking for. 我已经搜索了一些问题，但找不到确切的答案。 I have a requirement to search through large strings of text looking for keywords matches. 我需要搜索大字符串文本以查找关键字匹配项。 I was using IndexOf, however, i require to find whole word matches eg if i search for Java, but the text contains JavaScript, it shouldn't match. 我正在使用IndexOf，但是，我需要查找整个单词匹配项，例如，如果我搜索Java，但文本包含JavaScript，则该字符串不匹配。 This works fine using \\b{ pattern }\\b , but if i search for something like C#, then it doesn't work. 使用\\ b { 模式 } \\ b可以很好地工作，但是如果我搜索类似C＃的内容，那么它将无法正常工作。

Below is a few examples of text strings that i am searching through: 以下是一些我正在搜索的文本字符串的示例：

languages include Java,JavaScript,MySql,C#
languages include Java/JavaScript/MySql/C#
languages include Java, JavaScript, MySql, C#

Obviously the issue is with the special character '#'; 显然，问题在于特殊字符“＃”； so this also doesn't work when searching for C++. 因此，这在搜索C ++时也不起作用。

Answer 1

Escape the pattern using Regex.Escape and replace the context-dependent \\b word boundaries with (?<!\\w) / (?!\\w) lookarounds: 使用Regex.Escape转义模式，并使用(?<!\\w) / (?!\\w) Regex.Escape替换与上下文相关的 \\b单词边界：

var rx = $@"(?<!\w){Regex.Escape(pattern)}(?!\w)";

The (?<!\\w) is a negative lookbehind that fails the match if there is a start of string or a non-word char immediately before the current location, and (?!\\w) is a negative looahead that fails the match if there is an end of string or a non-word char immediately after the current location. (?<!\\w)是一个否定的向后查找，如果在当前位置之前有字符串的开头或非单词char开头，则匹配失败，并且(?!\\w)是一个使匹配失败的否定looahead如果在当前位置之后紧跟一个字符串结尾或一个非单词char。

Answer 2

Yeah, this is because there isn't a word boundary (a \\b ) after the # , because # isn't a "word" character. 是的，这是因为#后面没有单词边界（a \\b ），因为#不是“单词”字符。 You could use a regular expression like the following, which searches for a character that isn't a part of a language name [^a-zA-Z+#] after the language: 您可以使用如下所示的正则表达式，该正则表达式在语言之后搜索不属于语言名称[^a-zA-Z+#]的字符：

\b{pattern}[^a-zA-Z+#]

Or, if you believe you can list all of the possible characters that aren't part of a language name (for example, whitespace, , , . , and ; ): 或者，如果你相信你可以列出所有的不属于语言名称的一部分可能的字符（例如，空格, ， . ，和; ）：

[\s,.;]{pattern}[\s,.;]

Alternately, if it is possible that a language name is at the very end of a string (depending on what you're getting the data from), you might need to also match the end of the string $ in addition to the separators, or similarly, the beginning of the string ^ . 或者，如果语言名称可能位于字符串的末尾（取决于要从中获取数据的内容），则除分隔符外，还可能需要匹配字符串$的末尾，或者类似地，字符串^的开头。

[\s,.;]{pattern}(?:[\s,.;]|$)

C＃正则表达式匹配整个单词，带有特殊字符

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-07-12 15:13:42

解决方案2
1 2017-07-12 15:06:28

C＃正则表达式匹配整个单词，带有特殊字符

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-07-12 15:13:42

解决方案2 1 2017-07-12 15:06:28

解决方案1
2 已采纳 2017-07-12 15:13:42

解决方案2
1 2017-07-12 15:06:28