如何忽略由特定字符串包装的正则表达式匹配？

Question

Long time lurker, first time poster- please bare with me, I'm a regular expression n00b, but I had a great idea for some functionality on a project and I've tried to implement it to the best of my ability but I need a little help achieving the desired effect. 很长一段时间潜伏，第一次海报 - 请光临我，我是一个正则表达式n00b，但我对一个项目的某些功能有一个好主意，我试图尽我所能实现它但我需要一点帮助达到预期的效果。 The page in question is: http://dev.favorcollective.com/guidelines/ (just to provide some context) 有问题的页面是： http ： //dev.favorcollective.com/guidelines/ （仅提供一些上下文）

I'm using php's preg_replace to go through a particular page's contents (giant string) and I'm having it search for glossary terms and then I wrap the terms with a bit of html that enables dynamic glossary definition tooltips. 我正在使用php的preg_replace来浏览特定页面的内容（巨型字符串），并让它搜索词汇表术语，然后用一些html包装这些术语，以启用动态词汇表定义工具提示。

Here is my current code: 这是我目前的代码：

function annotate($content)
{
    global $glossary_terms;
    $search =  array();
    $replace = array();
    $count=1;

    foreach ($glossary_terms as $term):
        array_push($search,'/\b('.preg_quote($term['term'],'/').')[?=a-zA-Z]*/i');
        $id = "annotation-".$count;
        $replacement = '<a href="'.get_bloginfo('url').'/glossary#'.preg_replace( '/\s+/', '', $term['term']).'" class="annotation" rel="'.$id.'">'.$term['term'].'</a><span id="'.$id.'" style="display:none;"><span class="term">'.$term['term'].'</span><span class="definition">'.$term['def'].'</span></span>';
         array_push($replace,(string)$replacement);

         $count++;

    endforeach;

    return preg_replace($search, $replace, $content);
}

• But what if I want to ignore matches inside of <h#> </h#> tags? •但是，如果我想忽略<h＃> </ h＃>标记内的匹配怎么办？

• I also have a particular string that I do not want a specific term to match within. •我也有一个特定的字符串，我不想在其中匹配特定的术语。 For example, I want the word "proficiency" to match any time it is NOT used in the context of "ACTFL Proficiency Guidelines" how would I go about adding exceptions to my regular expression? 例如，我想在“ACTFL熟练度指南”的上下文中使用“熟练度”这个词来匹配任何时候我如何在正则表达式中添加例外？ Is that even an option? 这甚至是一种选择吗？

• Finally, how can I return the matched text as a variable? •最后，如何将匹配的文本作为变量返回？ Currently when I match for a term ending in 's' or 'ing' (on purpose) my script prints the matched term rather than the original string that was matched (ie it's replacing "descriptions" with "description"). 目前，当我匹配以's'或'ing'结尾的术语（故意）时，我的脚本打印匹配的术语而不是匹配的原始字符串（即它将“描述”替换为“描述”）。 Is there anyway to do that? 反正有吗？

Thanks! 谢谢！

Answer 1

not a php guy (c#), but here goes. 不是一个PHP家伙（C＃），但是这里走了。 I assume that: 我认为：

'/\\b('.preg_quote($term['term'],'/').')[?=a-zA-Z]*/i' will map to this far more readable pattern: '/\\b('.preg_quote($term['term'],'/').')[?=a-zA-Z]*/i'将映射到这个更具可读性的模式：

/\b(ESCAPED_TERM)[?=a-zA-Z]*/i

so, as far as excluding <h#> type tags, regex is ok only if you can assume your data would be the simple, non-nested case: <h#>TERM<h#>. 因此，只要排除<h＃>类型标记，只有当您可以假设您的数据是简单的非嵌套情况时，正则表达式才可以：<h＃> TERM <h＃>。 If you can, you can use a negative lookahead assertion: 如果可以，您可以使用负前瞻断言：

/\b(ESCAPED_TERM)(?!<h\d>)[?=a-zA-Z]*/i

you can use a lookahead with a lookbehind to handle your special case: 你可以使用带有lookbehind的前瞻来处理你的特殊情况：

/\b(ESCAPED_TERM|(?<!ACTFL )Proficiency(?!\sGuidelines))(?!<h\d>)[?=a-zA-Z]*/i

note: if you have a bunch of these special cases, PHP might (should) have an "ignore whitespace" flag which will let you put each token on newline. 注意：如果你有一堆这些特殊情况，PHP可能（应该）有一个“忽略空格”标志，它可以让你将每个标记放在换行符上。

Answer 2

Regular expressions are awesome, wonderful, magical. 正则表达式很棒，很棒，很神奇。 But everything has its limits. 但是，一切都有其局限性。

That's why it's nice to have a language like PHP to provide the extra functionality. 这就是为什么拥有像PHP这样的语言来提供额外功能的原因。 :) :)

Can you strip out headers with a non-greedy regexp? 你能用非贪婪的正则表达式删除标题吗？

$content = preg_replace('/<h[1-6]>.*?<\/h[1-6]>/sim', "", $content);

If non-greedy evaluations aren't working, what about just assuming that there won't be any other HTML inside your headers? 如果非贪婪的评估不起作用，那么假设你的标题中没有任何其他HTML呢？

$content = preg_replace('/<h[1-6]>[^<]*<\/h[1-6]>/im', "", $content);

Also, you might want to use sprintf to simplify your replacement: 此外，您可能希望使用sprintf来简化替换：

/*
  1  get_bloginfo('url')
  2  preg_replace( '/\s+/', '', $term['term']).
  3  $id
  4  $term['term']
  5  $term['def']
*/
$rfmt = '<a href="%1$s/glossary#%2$s" class="annotation" rel="%3$s">%4$s</a><span id="%3$s" style="display:none;"><span class="term">%4$s</span><span class="definition">%5$s</span></span>';

...

$replacement = sprintf($rfmt, get_bloginfo('url'), preg_replace( '/\s+/', '', $term['term']), $id, $term['term'], $term['def'] );

如何忽略由特定字符串包装的正则表达式匹配？

问题描述

2 个解决方案

解决方案1
3 2011-12-15 18:48:39

解决方案2
0 2011-12-15 20:31:48

如何忽略由特定字符串包装的正则表达式匹配？

问题描述

2 个解决方案

解决方案1 3 2011-12-15 18:48:39

解决方案2 0 2011-12-15 20:31:48

解决方案1
3 2011-12-15 18:48:39

解决方案2
0 2011-12-15 20:31:48