简体   繁体   English

C# - Regex匹配整个单词

[英]C# - Regex Match whole words

I need to match all the whole words containing a given a string. 我需要匹配包含给定字符串的所有单词。

string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";

Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);

I need the result to be: 我需要的结果是:

MYTESTING
YOUTESTED
TESTING

But I get: 但我得到:

TESTING
TESTED
.TESTING

How do I achieve this with Regular expressions. 如何使用正则表达式实现此目的。

Edit: Extended sample string. 编辑:扩展的示例字符串。

If you were looking for all words including 'TEST', you should use 如果您正在寻找包括'TEST'在内的所有单词,您应该使用

@"(?<TM>\w*TEST\w*)"

\\w includes word characters and is short for [A-Za-z0-9_] \\ w包含单词字符,是[A-Za-z0-9_]的缩写

保持简单:为什么不尝试\\w*TEST\\w*作为匹配模式。

I get the results you are expecting with the following: 我得到了您期望的结果,具体如下:

string s = @"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";

var m = Regex.Matches(s, @"(\w*TEST\w*)", RegexOptions.IgnoreCase);

Try using \\b . 尝试使用\\b It's the regex flag for a non-word delimiter. 它是非单词分隔符的正则表达式标志。 If you wanted to match both words you could use: 如果你想匹配两个单词,你可以使用:

/\b[a-z]+\b/i

BTW, .net doesn't need the surrounding / , and the i is just a case-insensitive match flag. BTW,.net不需要周围的/ ,而i只是一个不区分大小写的匹配标志。

.NET Alternative: .NET替代方案:

var re = new Regex(@"\b[a-z]+\b", RegexOptions.IgnoreCase);

Using Groups I think you can achieve it. 使用组我认为你可以实现它。

        string s = @"ABC.TESTING
        XYZ.TESTED";
        Regex r = new Regex(@"(?<TM>[!\..]*(?<test>TEST.*))", RegexOptions.Multiline);
        var mc= r.Matches(s);
        foreach (Match match in mc)
        {
            Console.WriteLine(match.Groups["test"]);
        }

Works exactly like you want. 工作完全像你想要的。

BTW, your regular expression pattern should be a verbatim string ( @"") 顺便说一下,你的正则表达式模式应该是一个逐字字符串(@“”)

Regex r = new Regex(@"(?<TM>[^.]*TEST.*)", RegexOptions.IgnoreCase);

First, as @manojlds said, you should use verbatim strings for regexes whenever possible. 首先,正如@manojlds所说,你应该尽可能使用逐字符串作为正则表达式。 Otherwise you'll have to use two backslashes in most of your regex escape sequences, not just one (eg [!\\\\..]* ). 否则,你必须在大多数正则表达式转义序列中使用两个反斜杠,而不只是一个(例如[!\\\\..]* )。

Second, if you want to match anything but a dot, that part of the regex should be [^.]* . 其次,如果你想匹配除了点之外的任何东西,正则表达式的那部分应该是[^.]* ^ is the metacharacter that inverts the character class, not ! ^是反转字符类的元字符,而不是! , and . ,和. has no special meaning in that context, so it doesn't need to be escaped. 在该上下文中没有特殊含义,因此不需要进行转义。 But you should probably use \\w* instead, or even [AZ]* , depending on what exactly you mean by "word". 但你应该使用\\w*代替,甚至[AZ]* ,这取决于你对“单词”的确切含义。 [!\\..] matches ! [!\\..]匹配! or . . .

Regex r = new Regex(@"(?<TM>[A-Z]*TEST[A-Z]*)", RegexOptions.IgnoreCase);

That way you don't need to bother with word boundaries, though they don't hurt: 这样你就不需要打扰单词边界,尽管它们不会受到伤害:

Regex r = new Regex(@"(?<TM>\b[A-Z]*TEST[A-Z]*\b)", RegexOptions.IgnoreCase);

Finally, if you're always taking the whole match anyway, you don't need to use a capturing group: 最后,如果你总是拿着整场比赛,你不需要使用捕获组:

Regex r = new Regex(@"\b[A-Z]*TEST[A-Z]*\b", RegexOptions.IgnoreCase);

The matched text will be available via Match's Value property. 匹配的文本将通过Match的Value属性提供。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM