简体   繁体   English

正则表达式搜索一个字母开头的单词和包含另一个单词的下一个单词

[英]Regex searching for a letter starting a word and next word containing another word

How might I search through a list of names and only return the names that have a word starting with 's' and the next word starting with 'mary'? 我如何搜索名称列表,而只返回名称以“ s”开头的单词和下一个以“ mary”开头的单词的名字?

For example, I have 2 titles: "Avera St. Mary's Hospital" and "Arthritis Care Specialists of Maryland". 例如,我有2个标题:“ Avera圣玛丽医院”和“马里兰州关节炎护理专家”。 I search 'S Mary' and would like it to return "Avera St. Mary's Hospital" not "Arthritis Care Specialists of Maryland". 我搜索“ S Mary”,并希望返回“ Avera St. Mary's Hospital”而不是“ Maryland Arthritis Care Specialists”。 My code returns both...Any help would be much appreciated! 我的代码都返回了...任何帮助将不胜感激!

var testList = new List<string>();
List<string> titles = new List<string>();
titles.Add("Avera St. Mary's Hospital");
titles.Add("Arthritis Care Specialists of Maryland");
foreach (var title in titles)
{
    var pattern = @"(?<!\w)s.*\smary";
    Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
    Match m = r.Match(title);
    if (m.Success)
    {
        testList.Add(title);
    }
}

You need to change your regular expression like: 您需要更改正则表达式,例如:

var pattern = @"(?<!\\w)s\\w+[-| |~|@|(|)|.]*[\\s]+Mary";` var pattern = @"(?<!\\w)s\\w+[-| |〜| @ |(|)|。] * [\\ s] + Mary”;`

[-|`|~|@|(|)|.] specifies the special characters allowed between S* and Mary like St- Mary [-|`| ~~ @@ |(|)|。]指定S *和Mary之间允许的特殊字符,例如St- Mary

Put a \\b — which means word boundary after mary . 放置\\b表示mary之后的单词边界

demo 演示

The .* is the problem within the regular expression given in the question. .*是问题中给出的正则表达式中的问题。 That .* matches too much text. .*匹配太多文本。 (Changing it to a non-greedy .*? will not work.) (将其更改为非贪婪的.*?无效。)

From the question and additional example in comments, the match should be of: 从问题和注释中的其他示例中,匹配项应为:

  • A word starting with s . s开头的单词。 The definition of "word" is not precise but using "any characters that are not spaces" matches the examples. “单词”的定义并不精确,但是使用“不是空格的任何字符”会与示例匹配。
  • A separator between two words. 两个单词之间的分隔符。 Assume that one or more spaces is allowed. 假设允许一个或多个空格。
  • A word starting with the letters mary . 一个以字母mary开头的单词。 Anything may follow these four characters. 任何可能跟随这四个字符的内容。

This leads to the simple regular expression: \\bs[^ ]* +mary 这导致了简单的正则表达式: \\bs[^ ]* +mary

\b               A word boundary
s                This exact character
[^ ]*            Zero or more characters that are not spaces
 +               One or more spaces
mary             These exact characters

Combining and sorting the examples in the question and the comments gives these as example that should match: 将问题和注释中的示例进行组合和排序,将它们作为应匹配的示例:

Avera St. Mary's Hospital
Carondelet St. Mary's Hospital.
Centre Hospitalier St- Mary,
saint mary,
Saint Mary's Home of Erie,
st mary
st mary's
st. mary,

These are example that should not match: 这些是不匹配的示例:

Arthritis Care Specialists of Maryland
Cardiovascular Specialists Of Central Maryland,

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM