[英]Regex to match words following a pattern
I don't know how to phrase the title, so I will be doing the explaining here. 我不知道标题的措辞,所以我将在这里进行解释。 I have sample text like this:
我有这样的示例文本:
Line 1
1号线
Contents and text in the line.行中的内容和文本。
It's a paragraph.这是一段。
Line 2
2号线
Those for this line.那些为这一行。
Another paragraph另一段
Line 3
3号线
More contents.更多内容。
Line 4
4号线
More contents...更多内容...
How do I extract the paragraphs? 如何提取段落? I tried this:
我尝试了这个:
(?:Line \\d{1,3})(.*?)(?:Line \\d{1,3})
This matched odd numbered paragraphs, like paragraphs 1, 3, 5 etc. I'm working with C# but this is regex, so I don't think there will be any major difference. 该段与奇数段匹配,例如第1、3、5段等。我正在使用C#,但这是正则表达式,因此我认为不会有什么大的不同。
Here is a pattern which should work: 这是一个应该起作用的模式:
(Line \d+.*?)(?=Line|$)
This says to match a paragraph beginning with Line
, followed by anything up until hitting the start of the next paragraph (ie Line
) or the end of the text. 这表示要匹配以
Line
开头的段落,然后匹配所有内容,直到到达下一个段落的开头(即Line
) 或文本的结尾。 The end of the text would occur for the last paragraph. 文本的结尾将出现在最后一段。
You would also need to run this regex in dot all mode, or, if not, replace the .*
with [\\s\\S]*
. 您还需要以点所有模式运行此正则表达式,如果没有,则将
.*
替换为[\\s\\S]*
。
If you want to select only the text without the "Line \\d" pattern, you can use this. 如果您只想选择没有“ Line \\ d”模式的文本,则可以使用它。
This is a fine tuning on your regex: 这是对您的正则表达式的微调:
(?:Line \d+\n)(.*?)(?=\nLine \d+\n|$)
Because we cant use the wild card in look behind, i used like you did the non-capturing group, and choosing the text until we hit the Line pattern again or end of file. 因为我们不能在后面使用通配符,所以我像您一样使用非捕获组,然后选择文本,直到我们再次按下Line模式或文件结尾。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.