简体   繁体   English

正则表达式单词边界捕获错误的单词

[英]Regex word boundaries capturing wrong words

I am having some difficulties trying to get my simple Regex statement in C# working the way I want it to. 我在尝试以我希望的方式在C#中运行简单的Regex语句时遇到了一些困难。

If I have a long string and I want to find the word "executive" but NOT "executives" I thought my regex would look something like this: 如果我有一个长字符串,但我想找到单词“ executive”而不是 “ executives”,我想我的正则表达式将如下所示:

Regex.IsMatch(input, string.Format(@"\b{0}\b", "executive");

This, however, is still matching on inputs that contain only executives and not executive (singular). 但是,这仍然与仅包含executives而不包含executive (单数)的输入匹配。

I thought word boundaries in regex, when used at the beginning and end of your regex text, would specify that you only want to match that word and not any other form of that word? 我想在正则表达式字边界,在开始和你的正则表达式的文本结束时,将指定你只想匹配这个词,而不是这个词的任何其他形式?

Edit: To clarify whats happening, I am trying to find all of the Notes among Students that contain the word executive and ignoring words that simply contain "executive". 编辑:澄清发生了什么,我想找到所有的NotesStudents包含单词executive ,并忽略了只包含“执行”的话。 As follows: 如下:

var studentMatches =
    Students.SelectMany(o => o.Notes)
        .Where(c => Regex.Match(c.NoteText, string.Format(@"\b{0}\b", query)).Success).ToList();

where query would be "executive" in this case. 在这种情况下, query将是“执行的”。

Whats strange is that while the above code will match on executives even though I don't want it to, the following code will not (aka it does what I am expecting it to do): 奇怪的是,尽管即使我不希望上面的代码也可以匹配executives ,但是下面的代码却不行 (也就是我期望的那样):

foreach (var stu in Students)
{
    foreach (var note in stu.Notes)
    {

        if (Regex.IsMatch(note.NoteText, string.Format(@"\b{0}\b", query)))
            Console.WriteLine(stu.LastName);
    }
}

Why would a nested for loop with the same regex code produce accurate matches while a linq expression seems to want to return anything that contains the word I am searching for? 为什么使用相同的正则表达式代码的嵌套for循环会产生准确的匹配,而linq表达式似乎要返回包含我要搜索的单词的任何内容?

Your linq query produces the correct result. 您的linq查询产生正确的结果。 What you see is what you have written. 您所看到的就是您所写的。

Let's give proper names to make it clear 让我们给它起个适当的名字

var noteMatches = Students.SelectMany(student => student.Notes)
    .Where(note => Regex.Match(note.NoteText, string.Format(@"\b{0}\b", query)).Success)
    .ToList();

In this query after executing SelectMany we received a flattened list of all notes. 在执行SelectMany之后的此查询中,我们收到了所有注释的拼合列表。 Thus was lost the information about which note belonged to which student. 这样就丢失了有关哪个音符属于哪个学生的信息。

Meanwhile, in the sample code with foreach loops you output information about the student. 同时,在带有foreach循环的示例代码中,您输出有关学生的信息。

I can assume that you need a query like the following 我可以假设您需要以下查询

var studentMatches = Students.Where(student => student.Notes
        .Any(note => Regex.IsMatch(note.NoteText, string.Format(@"\b{0}\b", query))))
    .ToList();

However, it is not clear what result you want to obtain if the same student will have notes containing both executive and executives . 但是,如果同一个学生的笔记同时包含主管主管 ,则尚不清楚要获得什么结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM