I have some HTML that follows this pattern:
<p>1. ALLCAPSTEXT1 - etc etc</p>
<p>01. lowercasetext1 - etc etc</p>
<p>02. lowercasetext1 - etc etc</p>
<p>2. ALLCAPSTEXT2 - etc etc</p>
<p>01. lowercasetext2 - etc etc</p>
<p>02. lowercasetext2 - etc etc</p>
<p>03. lowercasetext2 - etc etc</p>
<p>3. ALLCAPSTEXT3 - etcetc</p>
<p>01. lowercasetext3 - etc etc</p>
The number of lines after the "ALLCAPSWORD" line` vary, so it can be anywhere from 1 to 10+. I want to select all of these lines with this pattern (it goes up to 100+ lines).
So far I have:
<p>(\d+)\.\s[A-Z][A-Z]+(.+)</p>\n+<p>(.+)</p>\n+<p>\d+\.\s[A-Z][A-Z]+(.+)</p>
where the stuff in the first p tag captures the line with all-caps text, then go to the next line and the stuff in the p tag captures the line with lower-case text, and then go to the next line with all-caps text.
So I want it to get all of this:
<p>1. ALLCAPSTEXT1 - etc etc</p>
<p>01. lowercasetext1 - etc etc</p>
<p>02. lowercasetext1 - etc etc</p>
but it only captures the first line after the all-caps text and then skips to the next line with all-caps text and does the same thing.
<p>1. ALLCAPSTEXT1 - etc etc</p>
<p>01. lowercasetext1 - etc etc</p>
then goes to :
<p>2. ALLCAPSTEXT2 - etc etc</p>
<p>01. lowercasetext2 - etc etc</p
Any hints on how I could get it to capture all lines that have lowercase text till it reaches the next line with all-caps text, rinse and repeat?
Could do it this way
(?m)^.*?ALLCAPSTEXT.*(?:(?!^.*?ALLCAPSTEXT)[\\S\\s])*
https://regex101.com/r/TfDsL9/1
Expanded
(?m)
^ .*? ALLCAPSTEXT .*
(?:
(?! ^ .*? ALLCAPSTEXT )
[\S\s]
)*
<p>\d+\.\s[A-Z]+.*<\/p>(\n+<p>\d+\.\s[a-z]+.*<\/p>)+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.