简体   繁体   中英

Shortest match in Regex

This is my regex:

/<strong>.*ingredients.*<\/ul>/im

Assuming the source code:

<strong>Contest closes on Thursday May 10th 2012 at 9pm PST</strong></div>
<br />
<br />
<br />
* I am not affiliated with Blue Marble Brands or Ines Rosales Tortas in any way.&nbsp; I am not sponsored by them and did not receive any compensation to write this post...I just simply think the&nbsp;Tortas&nbsp;are wonderful!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-35J5vNrXkqE/T6htXTafrmI/AAAAAAAAA5E/g2mtiuSpSmw/s1600/food+003.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="480" mea="true" src="http://1.bp.blogspot.com/-35J5vNrXkqE/T6htXTafrmI/AAAAAAAAA5E/g2mtiuSpSmw/s640/food+003.JPG" width="640" /></a></div>
<br />
<strong><span style="font-size: large;">Ingredients:</span></strong><br />
<ul>
<li>Ines Rosales Rosemary and Thyme Tortas</li>
<li>Pizza Sauce (ready made in a jar)</li>
<li>Roma Tomatoes</li>
<li>Roasted Red Peppers </li>
<li>Marinated Artichoke Hearts</li>
<li>Olives (I used Pitted Spanish Manzanilla Olives)</li>
<li>Daiya Vegan Mozzarella Cheese</li>
</ul>
<span style="font-size: large;"><strong>Directions:</strong></span><br />
<br />
Spread small amount of pizza sauce over Torta. 

the Regex is greedy and grabs everything from <strong>Contest...</ul> but the shortest match should yield <strong><span style="font-size: large;">Ingredients...</ul>

this is my gist: https://gist.github.com/3660370

::EDIT:: Please allow flexibility inbetween strong tag and ingredients, and ingredients and ul.

Try this:

/<strong><span.*ingredients.*<\/ul>/im

Please refrain from regex-ing html. Use Nokogiri or a similar library instead.

This should work:

/(?!<strong>.*<strong>.*<\/ul>)<strong>.*?ingredients.*?<\/ul>/im

Test it here

Basically, the regex is using the negative lookahead to avoid multiple <strong> before <\\ul\u0026gt; like this: (?!<strong>.*<strong>.*<\\/ul>)

I think this is what you're looking for:

/<strong>(?:(?!<strong>).)*ingredients.*?<\/ul>/im

Replacing the first .* with (?:(?!<strong>).)* allows it to match anything except another <strong> tag before it finds ingredients . After that, the non-greedy .*? causes it to stop matching at the first instance of </ul> it sees. (Your sample only contains the one <UL> element, but I'm assuming the real data could have more.)

The usual warnings apply: there are many ways this regex can be fooled even in perfectly valid HTML, to say nothing of the dreck we usually see out there.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM