PHP正则表达式匹配HTML标记之外的关键字

Question

我一直在尝试使用正则表达式匹配并替换HTML的一部分上的关键字的出现：

我想匹配keyword和<strong>keyword</strong>
但<a href="someurl.html" target="_blank">keyword</a>和<a href="someur2.html">already linked keyword </a>不应匹配

我只对匹配（和替换）第一行上的keyword感兴趣。

我想要这个的原因是用<a href="dictionary.php?k=keyword">keyword</s>替换keyword ，但只有keyword它不在<a>标签内。

任何帮助都感激不尽！

Answer 1

$str = preg_replace('~Moses(?!(?>[^<]*(?:<(?!/?a\b)[^<]*)*)</a>)~i',
                    '<a href="novo-mega-link.php">$0</a>', $str);

否定前瞻中的表达式与下一个结束</a>标记匹配，但前提是它首先没有看到打开的<a>标记。 如果成功，则意味着Moses一词位于一个锚元素中，因此前瞻失败，并且不会发生匹配。

这是一个演示。

Answer 2

我设法做了我想要的（ 不使用正则表达式 ）：

解析我的字符串的每个字符
删除所有<a>标签（将它们复制到临时数组并在字符串上保留占位符）
str_replace新字符串以替换所有关键字
通过它的原始<a>标签重新填充占位符

这是我使用的代码，以防其他人需要它：

$str = <<<STRA
Moses supposes his toeses are roses,
but <a href="original-moses1.html">Moses</a> supposes erroneously;
for nobody's toeses are posies of roses,
as Moses supposes his toeses to be.
Ganda <span class="cenas"><a href="original-moses2.html" target="_blank">Moses</a></span>!
STRA;

$arr1 = str_split($str);

$arr_links = array();
$phrase_holder = '';
$current_a = 0;
$goto_arr_links = false;
$close_a = false;

foreach($arr1 as $k => $v)
{
    if ($close_a == true)
    {
        if ($v == '>') {
            $close_a = false;
        } 
        continue;
    }

    if ($goto_arr_links == true)
    {
        $arr_links[$current_a] .= $v;
    }

    if ($v == '<' && $arr1[$k+1] == 'a') { /* <a */
        // keep collecting every char until </a>
        $arr_links[$current_a] .= $v;
        $goto_arr_links = true;
    } elseif ($v == '<' && $arr1[$k+1] == '/' && $arr1[$k+2] == 'a' && $arr1[$k+3] == '>' ) { /* </a> */
        $arr_links[$current_a] .= "/a>";

        $goto_arr_links = false;
        $close_a = true;
        $phrase_holder .= "{%$current_a%}"; /* put a parameter holder on the phrase */
        $current_a++;
    }    
    elseif ($goto_arr_links == false) {
        $phrase_holder .= $v;
    }
}

echo "Links Array:\n";
print_r($arr_links);
echo "\n\n\nPhrase Holder:\n";
echo $phrase_holder;
echo "\n\n\n(pre) Final Phrase (with my keyword replaced):\n";
$final_phrase = str_replace("Moses", "<a href=\"novo-mega-link.php\">Moses</a>", $phrase_holder);
echo $final_phrase;
echo "\n\n\nFinal Phrase:\n";
foreach($arr_links as $k => $v)
{
    $final_phrase = str_replace("{%$k%}", $v, $final_phrase);
}
echo $final_phrase;

输出：

链接数组：

Array
(
    [0] => <a href="original-moses1.html">Moses</a>
    [1] => <a href="original-moses2.html" target="_blank">Moses</a>
)

短语持有人：

Moses supposes his toeses are roses,
but {%0%} supposes erroneously;
for nobody's toeses are posies of roses,
as Moses supposes his toeses to be.
Ganda <span class="cenas">{%1%}</span>!

（上）最终短语（替换了我的关键字）：

<a href="novo-mega-link.php">Moses</a> supposes his toeses are roses,
but {%0%} supposes erroneously;
for nobody's toeses are posies of roses,
as <a href="novo-mega-link.php">Moses</a> supposes his toeses to be.
Ganda <span class="cenas">{%1%}</span>!

最后的短语：

<a href="novo-mega-link.php">Moses</a> supposes his toeses are roses,
but <a href="original-moses1.html">Moses</a> supposes erroneously;
for nobody's toeses are posies of roses,
as <a href="novo-mega-link.php">Moses</a> supposes his toeses to be.
Ganda <span class="cenas"><a href="original-moses2.html" target="_blank">Moses</a></span>!

Answer 3

$lines = explode( "\n", $content );
$lines[0] = stri_replace( "keyword", "replacement", $lines[0] );
$content = implode( "\n", $lines );

或者如果您明确要使用正则表达式

$lines = explode( "\n", $content );
$lines[0] = preg_replace( "/keyword/i", "replacement", $lines[0] );
$content = implode( "\n", $lines );

Answer 4

考虑使用HTML解析库，而不是使用诸如simplehtmldom之类的正则表达式。 您可以使用它来更新特定HTML标记的内容（因此，忽略您不想更改的标记）。 那你不必使用正则表达式; 一旦你过滤了适当的标签，只需使用像str_replace这样的函数。

PHP正则表达式匹配HTML标记之外的关键字

问题描述

4 个解决方案

解决方案1
3 2011-10-18 16:08:09

解决方案2
1 已采纳 2011-10-18 15:07:24

解决方案3
0 2011-10-17 19:59:20

解决方案4
-1 2011-10-17 20:01:59

PHP正则表达式匹配HTML标记之外的关键字

问题描述

4 个解决方案

解决方案1 3 2011-10-18 16:08:09

解决方案2 1 已采纳 2011-10-18 15:07:24

解决方案3 0 2011-10-17 19:59:20

解决方案4 -1 2011-10-17 20:01:59

解决方案1
3 2011-10-18 16:08:09

解决方案2
1 已采纳 2011-10-18 15:07:24

解决方案3
0 2011-10-17 19:59:20

解决方案4
-1 2011-10-17 20:01:59