简体   繁体   中英

Regex start and end with same string, not just same character

I want to create a regular expression to receive:

<p class="MyClass">
   <p> something 1 </p>
   <p> something 2 </p>
   <span>         <span>  // or more html tag here
   something
</p>
something's here, not in any tag!

from:

<p class="MyClass">
   <p> something 1 </p>
   <p> something 2 </p>
   <span>         <span>  // or more html tag here
   something
</p>
something's here, not in any tag!

<p class="MyClass">
   <p> another thing 1</p>
   <p> another thing 2</p>
   <p> another thing 3</p>
   another thing
</p>
...

I think I will use a regex to match everything between <p class="MyClass"> and the next one. So the regex is /(<p class="MyClass">[\\s\\S]*)<p class="MyClass">/ , work correctly in this case. But it doesn't work when I want to get a notification of this page http://daotao.dut.udn.vn/sv/G_Thongbao_LopHP.aspx . The DOM is so strange ?!

Sorry for my bad English.

regex should be

(<p class="MyClass">[\s\S]*?)(?=<p class="MyClass">|$)
  • [\\s\\S]*? : *? is a lazy quantifier so that it matches the shortest the default is greedy (matches the largest).
  • (?=<p class="MyClass">|$) : lookhead so that it does not belongs to the match, and |$ to get also the last match

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM