简体   繁体   English

使用正则表达式从C#中的字符串获取子字符串

[英]Get substring from string in C# using Regular Expression

I have a string like: 我有一个像这样的字符串:

Brief Exercise 1-1 Types of Businesses Brief Exercise 1-2 Forms of Organization Brief Exercise 1-3 Business Activities.

I want to break above string using regular expression so that it can be like: 我想使用正则表达式突破字符串,使其类似于:

Types of Businesses
Forms of Organization
Business Activities.

Please don't say that I can break it using 1-1, 1-2 and 1-3 because it will bring the word "Brief Exercise" in between the sentences. 请不要说我可以使用1-1、1-2和1-3来破坏它,因为它会在句子之间插入“简短练习”一词。 Later on I can have Exercise 1-1 or Problem 1-1 also. 稍后我也可以进行练习1-1或问题1-1。 So I want some general Regular expression. 所以我想要一些常规的正则表达式。

Any efficient regular expression for this scenario ? 这种情况下有效的正则表达式吗?

var regex=new Regex(@"Brief (?:Exercise|Problem) \d+-\d+\s");
var result=string.Join("\n",regex.Split(x).Where(a=>!string.IsNullOrEmpty(a)));

The regex will match "Brief " followed by either "Exercise" or "Problem" (the ?: makes the group non capturing), followed by a space, then 1 or more digits then a "-", then one or more digits then a space. 正则表达式将匹配“ Brief”,后跟“ Exercise”或“ Problem”(?:使组无法捕获),后跟一个空格,然后是1个或多个数字,然后是“-”,然后是一个或多个数字空间。

The second statement uses the split function to split the string into an array and then regex to skip all the empty entries (otherwise the split would include the empty string at the begining, you could use Skip(1) instead of Where(a=>!string.IsNullOrEmpty(a)) , and then finally uses string.Join to combine the array back into string with \\n as the seperator. 第二条语句使用split函数将字符串拆分为一个数组,然后使用正则表达式跳过所有空条目(否则,拆分将在开始时包含空字符串,您可以使用Skip(1)代替Where(a=>!string.IsNullOrEmpty(a)) ,然后最终使用string.Join将数组组合回以\\ n作为分隔符的字符串。

You could use regex.Replace to convert directly to \\n but you will end up with a \\n at the begining that you would have to strip. 您可以使用regex.Replace直接将其转换为\\ n,但是一开始您将不得不得到\\ n来结束剥离。

--EDIT--- - 编辑 - -

if the fist number is always 1 and the second number is 1-50ish you could use the following regex to support 0-59 如果第一个数字始终为1,第二个数字为1-50ish,则可以使用以下正则表达式来支持0-59

var regex=new Regex(@"Brief (?:Exercise|Problem) 1-\[1-5]?\d\s");

This regular expression will match on "Brief Exercise 1-" followed by a digit and an optional second digit: 此正则表达式将与“简明练习1-”相匹配,后跟一个数字和一个可选的第二个数字:

@"Brief Exercise 1-\d\d?"

Update: 更新:

Since you might have "Problem" as well, an alternation between Exercise and Problem is also needed (using non capturing parenthesis): 因为你可能有“问题”,以及,之间的交替ExerciseProblem ,还需要(使用非捕获括号):

@"Brief (?:Exercise|Problem) 1-\d\d?"

Why don't you do it the easy way? 您为什么不以简单的方式进行操作? I mean, if the regular part is "Brief Exercise #-#" Replace it by some split character and then split the resulting string to obtain what you want. 我的意思是,如果常规部分是“ Brief Exercise#-#”,则将其替换为某些拆分字符,然后拆分结果字符串以获取所需的内容。

If you do it otherwise you will always have to take care of special cases. 如果不这样做,您将始终不得不处理特殊情况。

string pattern = "Brief Exercise \d+-\d+";
Regex reg = new Regex(patter);
string out = regex.replace(yourstring, "|");
string results[] = out.split("|");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM