简体   繁体   English

正则表达式以相同的字符串开头和结尾,而不仅仅是相同的字符

[英]Regex start and end with same string, not just same character

I want to create a regular expression to receive: 我想创建一个正则表达式来接收:

<p class="MyClass">
   <p> something 1 </p>
   <p> something 2 </p>
   <span>         <span>  // or more html tag here
   something
</p>
something's here, not in any tag!

from: 从:

<p class="MyClass">
   <p> something 1 </p>
   <p> something 2 </p>
   <span>         <span>  // or more html tag here
   something
</p>
something's here, not in any tag!

<p class="MyClass">
   <p> another thing 1</p>
   <p> another thing 2</p>
   <p> another thing 3</p>
   another thing
</p>
...

I think I will use a regex to match everything between <p class="MyClass"> and the next one. 我想我将使用正则表达式来匹配<p class="MyClass">和下一个之间的所有内容。 So the regex is /(<p class="MyClass">[\\s\\S]*)<p class="MyClass">/ , work correctly in this case. 因此,正则表达式为/(<p class="MyClass">[\\s\\S]*)<p class="MyClass">/ ,在这种情况下可以正常工作。 But it doesn't work when I want to get a notification of this page http://daotao.dut.udn.vn/sv/G_Thongbao_LopHP.aspx . 但是,当我想收到此页面的通知http://daotao.dut.udn.vn/sv/G_Thongbao_LopHP.aspx时,它不起作用。 The DOM is so strange ?! DOM是如此奇怪?

Sorry for my bad English. 对不起,我的英语不好。

regex should be 正则表达式应该是

(<p class="MyClass">[\s\S]*?)(?=<p class="MyClass">|$)
  • [\\s\\S]*? : *? *? is a lazy quantifier so that it matches the shortest the default is greedy (matches the largest). 是一个懒惰的量词,因此它匹配最短的默认值是贪婪(匹配最大的)。
  • (?=<p class="MyClass">|$) : lookhead so that it does not belongs to the match, and |$ to get also the last match (?=<p class="MyClass">|$) :lookhead,因此它不属于匹配项,而|$也可以得到最后一个匹配项

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM