简体   繁体   中英

RegEx match first occurrence before keyword

I have the following string:

<ul><li><span>some words here.</span></li><li><span>other words here.</span></li><li><span>Code: 55555.</span></li></ul>

My goal is to remove this part from the string, the set of li tags which contain "code" keyword:

<li><span>Code: 55555.</span></li>

I am trying to write a RegEx that will help me match and replace my substring. Text in between <li></li> might vary but it will always have the keyword "Code". This is what I have so far:

<li>(.*)code:(.*?)<\/li>

The problem is, it matches from the first <li> tag and I want it to match starting from the <li> tag which is right before our keyword "code".

Thank you for your help!

<li>(?:.(?!</li>))+Code:(?:.*?)</li>

  • Match <li> literally
  • Followed by any number of characters where the literal </li> doesn't match (this ensures the match will start only at the relevant <li> )
  • Followed by the literal Code:
  • Followed by any number of characters (non-greedy) until the literal </li> is matched

Demo

You can try regex groups for that, so your regex would be something like that:

r'(<\\li>(.*)code:(.*?)</li>){1,}'

This regex will match more than 1 occurrence of string which has format <\\li>(.*)code:(.*?)</li>.

I guess this might help you a bit.

 (.*)(<li>.*span.*<\\/li>)(.*) 

The RegEx provided by Tim Biegeleisen works just fine. If you want to make sure the word "Code" exists, just replace 'span' with 'Code', like:

(.*)(<li>.*Code.*<\/li>)(.*)

  • [az|AZ] [Cc]ode: [0-9|.]+[az|AZ]
  • Here the keyword "Code" is made mandatory in the regex

    The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

     
    粤ICP备18138465号  © 2020-2024 STACKOOM.COM