[英]Regex start and end with same string, not just same character
I want to create a regular expression to receive: 我想创建一个正则表达式来接收:
<p class="MyClass">
<p> something 1 </p>
<p> something 2 </p>
<span> <span> // or more html tag here
something
</p>
something's here, not in any tag!
from: 从:
<p class="MyClass">
<p> something 1 </p>
<p> something 2 </p>
<span> <span> // or more html tag here
something
</p>
something's here, not in any tag!
<p class="MyClass">
<p> another thing 1</p>
<p> another thing 2</p>
<p> another thing 3</p>
another thing
</p>
...
I think I will use a regex to match everything between <p class="MyClass">
and the next one. 我想我将使用正则表达式来匹配
<p class="MyClass">
和下一个之间的所有内容。 So the regex is /(<p class="MyClass">[\\s\\S]*)<p class="MyClass">/
, work correctly in this case. 因此,正则表达式为
/(<p class="MyClass">[\\s\\S]*)<p class="MyClass">/
,在这种情况下可以正常工作。 But it doesn't work when I want to get a notification of this page http://daotao.dut.udn.vn/sv/G_Thongbao_LopHP.aspx . 但是,当我想收到此页面的通知http://daotao.dut.udn.vn/sv/G_Thongbao_LopHP.aspx时,它不起作用。 The DOM is so strange ?!
DOM是如此奇怪?
Sorry for my bad English. 对不起,我的英语不好。
regex should be 正则表达式应该是
(<p class="MyClass">[\s\S]*?)(?=<p class="MyClass">|$)
[\\s\\S]*?
: *?
*?
is a lazy quantifier so that it matches the shortest the default is greedy (matches the largest). (?=<p class="MyClass">|$)
: lookhead so that it does not belongs to the match, and |$
to get also the last match (?=<p class="MyClass">|$)
:lookhead,因此它不属于匹配项,而|$
也可以得到最后一个匹配项
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.