使用php regex从html标签元素中删除属性

Question

想要删除html标记内的任何属性，我认为可以使用正则表达式来实现，但我并不擅长使用正则表达式。

尝试使用str_replace，但这不是正确的方法。 我已经搜索了与此类似的问题，但找不到任何问题。

例：

在变量中得到了这样的html标签：

$str = '
<p class="class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</p>
<span class="another_class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</span>
<ul class="another_class_style" style="background:#006;"></ul>
<li class="another_class_style" style=" list-style:circle; color:#930;">content</li>';

调用某些preg_match（）

$new_str = preg_match('', $str)

预期产量：

$new_str = '
<p>content</p>
<span>content</span>
<ul></ul>
<li>content</li>';

请注意，我不打算剥离html标签，而只需要删除标签内的所有标签元素。

php strip_tags() isn't an option

希望得到帮助。

Answer 1

尽管regex可以执行此任务，但通常建议使用DOM函数进行过滤或其他HTML操作。 这是一个可重用的类，该类使用DOM方法删除不需要的属性。 您只需设置所需的HTML标签和属性，即可过滤掉不需要的HTML部分。

class allow_some_html_tags {
    var $doc = null;
    var $xpath = null;
    var $allowed_tags = "";
    var $allowed_properties = array();

    function loadHTML( $html ) {
        $this->doc = new DOMDocument();
        $html = strip_tags( $html, $this->allowed_tags );
        @$this->doc->loadHTML( $html );
        $this->xpath = new DOMXPath( $this->doc );
    }
    function setAllowed( $tags = array(), $properties = array() ) {
        foreach( $tags as $allow ) $this->allowed_tags .= "<{$allow}>";
        foreach( $properties as $allow ) $this->allowed_properties[$allow] = 1;
    }
    function getAttributes( $tag ) {
        $r = array();
        for( $i = 0; $i < $tag->attributes->length; $i++ )
            $r[] = $tag->attributes->item($i)->name;
        return( $r );
    }
    function getCleanHTML() {
        $tags = $this->xpath->query("//*");
        foreach( $tags as $tag ) {
            $a = $this->getAttributes( $tag );
            foreach( $a as $attribute ) {
                if( !isset( $this->allowed_properties[$attribute] ) )
                    $tag->removeAttribute( $attribute );
            }
        }
        return( strip_tags( $this->doc->saveHTML(), $this->allowed_tags ) );
    }
}

该类使用strip_tags两次-一次以快速消除不需要的标记，然后从其余部分中删除属性后，消除由DOM函数（doctype，html，body）插入的其他标记。 要使用，只需执行以下操作：

$comments = new allow_some_html_tags();
$comments->setAllowed( array( "p", "span", "ul", "li" ), array("tabindex") );
$comments->loadHTML( $str );
$clean = $comments->getCleanHTML();

setAllowed函数采用两个数组-一组允许的标签和一组允许的属性（如果您以后决定要保留一些属性。）我已更改了输入字符串，以在某处包含一个添加的tabindex =“ 1”属性，以说明问题过滤。 $ clean的输出是：

<p>content</p>
<span>content</span>
<ul tabindex="3"></ul><li>content</li>

Answer 2

在php中删除html标签的最简单方法是strip_tags()

或者您可以通过删除

preg_replace("/<.*?>/", "", $str);

Answer 3

$str = '
<p class="class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</p>
<span class="another_class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</span>
<ul class="another_class_style" style="background:#006;"></ul>
<li class="another_class_style" style=" list-style:circle; color:#930;">content</li>';

$clean = preg_replace('/ .*".*"/', '', $str);

echo $clean;

将返回：

<p>content</p>
<span>content</span>
<ul></ul>
<li>content</li>

但是请不要使用正则表达式来解析HTML，而应使用DOM解析器。

使用php regex从html标签元素中删除属性

问题描述

3 个解决方案

解决方案1
1 2013-09-19 14:59:08

解决方案2
0 2013-09-19 14:16:58

解决方案3
0 已采纳 2013-09-19 14:19:58

使用php regex从html标签元素中删除属性

问题描述

3 个解决方案

解决方案1 1 2013-09-19 14:59:08

解决方案2 0 2013-09-19 14:16:58

解决方案3 0 已采纳 2013-09-19 14:19:58

解决方案1
1 2013-09-19 14:59:08

解决方案2
0 2013-09-19 14:16:58

解决方案3
0 已采纳 2013-09-19 14:19:58