简体   繁体   English

删除不支持的html标签(简单HTML Dom)

[英]Remove unsupported tags of html (Simple HTML Dom)

I would like to remove unsupported tags of html inserted by users (system define which tag is supported), example system is only supported " div " tag: 我想删除用户插入的html不支持的标签(系统定义支持的标签),示例系统仅支持“ div ”标签:

<div><span>Hello</span> <span>World</span></div>

will convert to: 将转换为:

<div>Hello World</div>

This is my code with Simple HTML DOM: 这是我的简单HTML DOM代码:

function main()
{
    $content = '<div><span>Hello</span> <span>World</span></div>';

    $html = str_get_html($content);

    $html = htmlParser($html);
}

function htmlParser($html)
{
    $supportedTags = ['div'];

    foreach ($html->childNodes() as $node) {
        // Remove unsupported tags
        if (!in_array($node->tag, $supportedTags)) {
            $node->parent()->innertext = str_replace($node->outertext, $node->innertext, $node->parent()->innertext);
            $node->outertext = '';
        }

        if ($node->childNodes()) {
            htmlParser($node);
        }
    }

    return $html;
}

But thing get wrong if contain multiple nested unsupported tags, eg: 但是,如果包含多个嵌套的不受支持的标签,则会出错,例如:

<div><span>Hello</span> <span>World</span> <span><b>!!</b></span></div>

it will be converted to 它将被转换为

<div>Hello World <b>!!</b></div>

but expected result is 但预期结果是

<div>Hello World !!</div>

What is the solution? 解决办法是什么? Should I continue to use Simple HTML DOM or find another way to solve this issue? 我应该继续使用简单HTML DOM还是找到其他方法来解决此问题?

Thanks for solving my problem in advanced. 感谢您提前解决我的问题。

You can do this with as much as I understand. 据我所知,您可以做到这一点。 strip_tags($html, '<div><b>');

Example : https://3v4l.org/p4nLV 范例https : //3v4l.org/p4nLV


Reference : http://php.net/strip_tags 参考http : //php.net/strip_tags

After some struggles, I found out I should not edit $node->parent() as it's in a loop and should load the childNodes first. 经过一些努力之后,我发现我不应该编辑$ node-> parent(),因为它处于循环中,应该首先加载childNodes。 The code should be like this: 代码应如下所示:

function htmlParser($html)
{
    $supportedTags = ['div'];

    foreach ($html->childNodes() as $node) {
        if ($node->childNodes()) {
            htmlParser($node);
        }

        // Remove unsupported tags
        if (!in_array($node->tag, $supportedTags)) {
            $node->outertext = $node->innertext;
        }
    }

    return $html;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM