简体   繁体   中英

C# Regex Match whole word, with special characters

I have searched through some questions but couldn't find the exact answer i am looking for. I have a requirement to search through large strings of text looking for keywords matches. I was using IndexOf, however, i require to find whole word matches eg if i search for Java, but the text contains JavaScript, it shouldn't match. This works fine using , but if i search for something like C#, then it doesn't work. 可以很好地工作,但是如果我搜索类似C#的内容,那么它将无法正常工作。

Below is a few examples of text strings that i am searching through:

languages include Java,JavaScript,MySql,C#
languages include Java/JavaScript/MySql/C#
languages include Java, JavaScript, MySql, C#

Obviously the issue is with the special character '#'; so this also doesn't work when searching for C++.

Escape the pattern using Regex.Escape and replace the context-dependent \\b word boundaries with (?<!\\w) / (?!\\w) lookarounds:

var rx = $@"(?<!\w){Regex.Escape(pattern)}(?!\w)";

The (?<!\\w) is a negative lookbehind that fails the match if there is a start of string or a non-word char immediately before the current location, and (?!\\w) is a negative looahead that fails the match if there is an end of string or a non-word char immediately after the current location.

Yeah, this is because there isn't a word boundary (a \\b ) after the # , because # isn't a "word" character. You could use a regular expression like the following, which searches for a character that isn't a part of a language name [^a-zA-Z+#] after the language:

\b{pattern}[^a-zA-Z+#]

Or, if you believe you can list all of the possible characters that aren't part of a language name (for example, whitespace, , , . , and ; ):

[\s,.;]{pattern}[\s,.;]

Alternately, if it is possible that a language name is at the very end of a string (depending on what you're getting the data from), you might need to also match the end of the string $ in addition to the separators, or similarly, the beginning of the string ^ .

[\s,.;]{pattern}(?:[\s,.;]|$)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM