简体   繁体   中英

How do I select multiple lines in a regular expression?

I have some HTML that follows this pattern:

<p>1. ALLCAPSTEXT1 - etc etc</p>
<p>01. lowercasetext1 - etc etc</p>
<p>02. lowercasetext1 - etc etc</p>
<p>2. ALLCAPSTEXT2 - etc etc</p>
<p>01. lowercasetext2 - etc etc</p>
<p>02. lowercasetext2 - etc etc</p>
<p>03. lowercasetext2 - etc etc</p>
<p>3. ALLCAPSTEXT3 - etcetc</p>
<p>01. lowercasetext3 - etc etc</p>

The number of lines after the "ALLCAPSWORD" line` vary, so it can be anywhere from 1 to 10+. I want to select all of these lines with this pattern (it goes up to 100+ lines).

So far I have:

<p>(\d+)\.\s[A-Z][A-Z]+(.+)</p>\n+<p>(.+)</p>\n+<p>\d+\.\s[A-Z][A-Z]+(.+)</p>

where the stuff in the first p tag captures the line with all-caps text, then go to the next line and the stuff in the p tag captures the line with lower-case text, and then go to the next line with all-caps text.

So I want it to get all of this:

<p>1. ALLCAPSTEXT1 - etc etc</p>
<p>01. lowercasetext1 - etc etc</p>
<p>02. lowercasetext1 - etc etc</p>

but it only captures the first line after the all-caps text and then skips to the next line with all-caps text and does the same thing.

<p>1. ALLCAPSTEXT1 - etc etc</p>
<p>01. lowercasetext1 - etc etc</p>

then goes to :

<p>2. ALLCAPSTEXT2 - etc etc</p>
<p>01. lowercasetext2 - etc etc</p

Any hints on how I could get it to capture all lines that have lowercase text till it reaches the next line with all-caps text, rinse and repeat?

Could do it this way

(?m)^.*?ALLCAPSTEXT.*(?:(?!^.*?ALLCAPSTEXT)[\\S\\s])*

https://regex101.com/r/TfDsL9/1

Expanded

 (?m)
 ^ .*? ALLCAPSTEXT .* 
 (?:
      (?! ^ .*? ALLCAPSTEXT )
      [\S\s] 
 )*
<p>\d+\.\s[A-Z]+.*<\/p>(\n+<p>\d+\.\s[a-z]+.*<\/p>)+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM