简体   繁体   中英

Regex to match words following a pattern

I don't know how to phrase the title, so I will be doing the explaining here. I have sample text like this:

Line 1
Contents and text in the line.
It's a paragraph.

Line 2
Those for this line.
Another paragraph

Line 3
More contents.

Line 4
More contents...

How do I extract the paragraphs? I tried this:
(?:Line \\d{1,3})(.*?)(?:Line \\d{1,3})

This matched odd numbered paragraphs, like paragraphs 1, 3, 5 etc. I'm working with C# but this is regex, so I don't think there will be any major difference.

Here is a pattern which should work:

(Line \d+.*?)(?=Line|$)

This says to match a paragraph beginning with Line , followed by anything up until hitting the start of the next paragraph (ie Line ) or the end of the text. The end of the text would occur for the last paragraph.

You would also need to run this regex in dot all mode, or, if not, replace the .* with [\\s\\S]* .

Demo

If you want to select only the text without the "Line \\d" pattern, you can use this.
This is a fine tuning on your regex:

(?:Line \d+\n)(.*?)(?=\nLine \d+\n|$)

Check It

Because we cant use the wild card in look behind, i used like you did the non-capturing group, and choosing the text until we hit the Line pattern again or end of file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM