简体   繁体   中英

Remove unsupported tags of html (Simple HTML Dom)

I would like to remove unsupported tags of html inserted by users (system define which tag is supported), example system is only supported " div " tag:

<div><span>Hello</span> <span>World</span></div>

will convert to:

<div>Hello World</div>

This is my code with Simple HTML DOM:

function main()
{
    $content = '<div><span>Hello</span> <span>World</span></div>';

    $html = str_get_html($content);

    $html = htmlParser($html);
}

function htmlParser($html)
{
    $supportedTags = ['div'];

    foreach ($html->childNodes() as $node) {
        // Remove unsupported tags
        if (!in_array($node->tag, $supportedTags)) {
            $node->parent()->innertext = str_replace($node->outertext, $node->innertext, $node->parent()->innertext);
            $node->outertext = '';
        }

        if ($node->childNodes()) {
            htmlParser($node);
        }
    }

    return $html;
}

But thing get wrong if contain multiple nested unsupported tags, eg:

<div><span>Hello</span> <span>World</span> <span><b>!!</b></span></div>

it will be converted to

<div>Hello World <b>!!</b></div>

but expected result is

<div>Hello World !!</div>

What is the solution? Should I continue to use Simple HTML DOM or find another way to solve this issue?

Thanks for solving my problem in advanced.

You can do this with as much as I understand. strip_tags($html, '<div><b>');

Example : https://3v4l.org/p4nLV


Reference : http://php.net/strip_tags

After some struggles, I found out I should not edit $node->parent() as it's in a loop and should load the childNodes first. The code should be like this:

function htmlParser($html)
{
    $supportedTags = ['div'];

    foreach ($html->childNodes() as $node) {
        if ($node->childNodes()) {
            htmlParser($node);
        }

        // Remove unsupported tags
        if (!in_array($node->tag, $supportedTags)) {
            $node->outertext = $node->innertext;
        }
    }

    return $html;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM