简体   繁体   English

c# - 如何将括号内的文本与正则表达式匹配?

[英]How to match text inside brackets with regex in c#?

I have following text我有以下文字

has helped discover and mentor such </br>
New York Times bestselling authors as Brandon Sanderson  </br>
(Mistborn), James Dashner (The Maze Runner), and Stephenie

I am taking last 3 words of first line and first 3 words of last line to find in between text by using regex.我正在使用正则表达式在文本之间查找第一行的最后 3 个单词和最后一行的前 3 个单词。 I am using following regex in c# code.我在 c# 代码中使用以下正则表达式。

string matchedText = "";
string RegexPattren = preLine + "[\\w\\W\\S\\s\\s\\D':;\"<>,.?]*" + postLine;
matchedText = Regex.Match(stBuilder.ToString(), RegexPattren).Value;
matchedText = preLine.Equals("") ? matchedText : matchedText.Replace(preLine, "");
matchedText = postLine.Equals("") ? matchedText : matchedText.Replace(postLine, "");
string[] MatchedLines = Regex.Split(matchedText, "</br>").Where(x => !string.IsNullOrEmpty(x.Trim())).ToArray();



string RegexPattren = preLine + "[\\w\\W\\S\\s\\s\\D':;\"<>,.?]*" + postLine;

which has followig values具有以下值

and mentor such [\w\W\S\s\s\D':;"<>,.?]* James Dashner

Above code is working fine and matched result is上面的代码工作正常,匹配的结果是

and mentor such  </br>New York Times bestselling authors as Brandon Sanderson  </br>(Mistborn), James Dashner

Problem occurs when words with brackets are found just like below, regex is not matching any text.当找到带有括号的单词时会出现问题,如下所示,正则表达式不匹配任何文本。

and mentor such [\w\W\S\s\s\D':;"<>,.?]* (Mistborn), James Dashner

How to match line which has text inside brackets before or after regex pattern in c# ? c# - 如何在正则表达式模式之前或之后匹配括号内有文本的行?

You'll have to escape the parenthesis like你必须像这样逃避括号

and mentor such [\w\W\S\s':;"<>,.?]*\(Mistborn\), James Dashner

That'll make it match the literal ( and ) .这将使它与文字()匹配。

And note that your regex had a space before (Mistborn) which doesn't exist in the text.请注意,您的正则表达式在(Mistborn)之前有一个空格,该空格在文本中不存在。 It's preceded by a newline.它前面有一个换行符。 I removed the space, but you could also change it to a \\s , which matches both space and newline.我删除了空格,但您也可以将其更改为\\s ,它同时匹配空格和换行符。

And lastly, \\D matches non numeric, which already is handled by \\W since numbers are matched by \\w .最后, \\D匹配非数字,这已经由\\W处理,因为数字与\\w匹配。 Actually, several of the characters in the class could be removed.实际上,类中的几个字符可以被删除。 If you set the RegexOptions.Singleline you would probably be OK with如果你设置了RegexOptions.Singleline你可能会接受

and mentor such .*\(Mistborn\), James Dashner

Check it out here at regex101 .在这里查看 regex101

PS.附注。 There's a .NET method to escape regex'es, Regex.Escape , but that complicates having actual regex patterns in there.有一种 .NET 方法可以转义正则表达式, Regex.Escape ,但这会使其中的实际正则表达式模式复杂化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM