正则表达式中最短的比赛

Question

This is my regex: 这是我的正则表达式：

/<strong>.*ingredients.*<\/ul>/im

Assuming the source code: 假设源代码：

<strong>Contest closes on Thursday May 10th 2012 at 9pm PST</strong></div>
<br />
<br />
<br />
* I am not affiliated with Blue Marble Brands or Ines Rosales Tortas in any way.&nbsp; I am not sponsored by them and did not receive any compensation to write this post...I just simply think the&nbsp;Tortas&nbsp;are wonderful!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-35J5vNrXkqE/T6htXTafrmI/AAAAAAAAA5E/g2mtiuSpSmw/s1600/food+003.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="480" mea="true" src="http://1.bp.blogspot.com/-35J5vNrXkqE/T6htXTafrmI/AAAAAAAAA5E/g2mtiuSpSmw/s640/food+003.JPG" width="640" /></a></div>
<br />
<strong><span style="font-size: large;">Ingredients:</span></strong><br />
<ul>
<li>Ines Rosales Rosemary and Thyme Tortas</li>
<li>Pizza Sauce (ready made in a jar)</li>
<li>Roma Tomatoes</li>
<li>Roasted Red Peppers </li>
<li>Marinated Artichoke Hearts</li>
<li>Olives (I used Pitted Spanish Manzanilla Olives)</li>
<li>Daiya Vegan Mozzarella Cheese</li>
</ul>
<span style="font-size: large;"><strong>Directions:</strong></span><br />
<br />
Spread small amount of pizza sauce over Torta.

the Regex is greedy and grabs everything from Contest...</ul> but the shortest match should yield Ingredients...</ul> 正则表达式很贪婪，可以从Contest...</ul>但最短的匹配应产生Ingredients...</ul>

this is my gist: https://gist.github.com/3660370 这是我的要旨： https : //gist.github.com/3660370

::EDIT:: Please allow flexibility inbetween strong tag and ingredients, and ingredients and ul. :: edit ::请在强标签和成分之间以及成分和ul之间留出灵活性。

Answer 1

Try this: 尝试这个：

/<strong><span.*ingredients.*<\/ul>/im

Please refrain from regex-ing html. 请不要对html进行正则表达式。 Use Nokogiri or a similar library instead. 请改用Nokogiri或类似的库。

Answer 2

This should work: 这应该工作：

/(?!<strong>.*<strong>.*<\/ul>)<strong>.*?ingredients.*?<\/ul>/im

Test it here 在这里测试

Basically, the regex is using the negative lookahead to avoid multiple  before <\\ul\u0026gt; like this: (?!.*.*<\\/ul>) 基本上，正则表达式使用负前瞻避免在<\\ul\u0026gt; 之前使用多个 ，例如： (?!.*.*<\\/ul>)

Answer 3

I think this is what you're looking for: 我认为这是您要寻找的：

/<strong>(?:(?!<strong>).)*ingredients.*?<\/ul>/im

Replacing the first .* with (?:(?!).)* allows it to match anything except another  tag before it finds ingredients . 用(?:(?!).)*替换第一个.*可以在找到ingredients之前匹配除另一个标记之外的任何其他内容。 After that, the non-greedy .*? 在那之后，非贪婪的.*? causes it to stop matching at the first instance of </ul> it sees. 导致它在看到的</ul>的第一个实例处停止匹配。 (Your sample only contains the one <UL> element, but I'm assuming the real data could have more.) （您的样本仅包含一个<UL>元素，但我假设实际数据可能包含更多元素。）

The usual warnings apply: there are many ways this regex can be fooled even in perfectly valid HTML, to say nothing of the dreck we usually see out there. 通常会出现警告：即使在完全有效的HTML中，也可以通过多种方法来欺骗该正则表达式，更不用说我们通常会看到的麻烦了。

正则表达式中最短的比赛

问题描述

3 个解决方案

解决方案1
0 2012-09-06 21:21:13

解决方案2
0 2012-09-06 22:02:55

解决方案3
0 2012-09-07 11:08:06

正则表达式中最短的比赛

问题描述

3 个解决方案

解决方案1 0 2012-09-06 21:21:13

解决方案2 0 2012-09-06 22:02:55

解决方案3 0 2012-09-07 11:08:06

解决方案1
0 2012-09-06 21:21:13

解决方案2
0 2012-09-06 22:02:55

解决方案3
0 2012-09-07 11:08:06