用php regex转换html标题

Question

我有带有html标记文本的字符串：

<p>Some random text</p>
<h2>This is a heading</h2>
<p>More text</p>

我想将其转换为类似的内容：

<p>Some random text</p>
<h2 id="This_is_a_heading">This is a heading</h2>
<p>More text</p>

这个简单的代码几乎可以做到：

 $patterns = array('#(<h2>)(.*)(</h2>)#i');
 $replace = array('<h2 id="\2">\2</h2>');
 $text = preg_replace($patterns, $replace, $text);

但是我仍然不知道如何在id属性中用underscores替换whitespaces ，最后在$text得到了这个：

<p>Some random text</p>
<h2 id="This is a heading">This is a heading</h2>
<p>More text</p>

我已经尝试搜索了几个小时，但是没有运气。 请帮忙。

Answer 1

使用HTML解析器

这是解析HTML的推荐方法。 除非您完全确定HTML字符串的格式是完全固定的，否则正则表达式处理不足，您必须使用HTML解析器。 这是使用PHP附带的DOMDocument类的解决方案：

$dom = new DOMDocument;
$errorState = libxml_use_internal_errors(true);
$dom->loadHTML($text);
foreach ($dom->getElementsByTagName('h2') as $tag) {
    $nodeValue = (string) $tag->nodeValue;
    $id = str_replace(' ', '_', $nodeValue);
    $tag->setAttribute('id', $id);
}

echo $dom->saveHTML();

使用正则表达式

对于简单的替换，DOM解析器可能会显得过大。 如果您不太在意结果的准确性，则可以使用正则表达式来完成任务。 请注意，如果标记之间包含其他属性或额外标签，则可能会中断此操作。

在这种情况下，您的preg_replace()将无法工作，因为它无法修改反向引用。 使用preg_replace_callback()代替：

$text = preg_replace_callback('#(<h2>)(.*)(</h2>)#i', function ($m) {
    $id = str_replace(' ', '_',$m[2]);
    return "<h2 id=\"$id\"></h2>";
}, $text);

用php regex转换html标题

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-02-16 18:40:22

用php regex转换html标题

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-02-16 18:40:22

解决方案1
1 已采纳 2014-02-16 18:40:22