简体   繁体   中英

PHP: Escape HTML entities between some HTML tags

Let's say that I have HTML like this:

<p>demo &</p><p>test</p><ul><li><p>Some <p><i><b>test<b/></i> text: < 15 ( less than 15 ) </p></p></li></ul><p></p>

I need to escape special characters (like ", ', <, >, &, etc) but only between h1, h2, p, ul, ol, li and b tags. So the result should be:

<p>demo &amp;</p><p>test</p><ul><li><p>Some <p>&lt;i&gt;<b>test</b>;&lt;\/i;&gt; text: &lt; 15 ( less than 15 ) </p></p></li></ul><p></p>

Do you have any idea how to do this? I've tried using DOMDocument but I can't load this HTML, because is invalid. I've also tried preg replacing, but I think this is too complex to do this magic.

There are various problems with the HTML as you point out, the furthest I've got so far is in fact too over eager and has a tendency to revisit text already processed. You will also probably have a better method of encoding the strings, I've just used htmlspecialchars as it's east to try.

The code uses XPath to find the various node types your after and then looks at the text content below that, it won't solve all your problems, but may give you a starting point...

<?php 
//error_reporting(E_ALL);
//ini_set('display_errors', 1);

$html = "<p>demo &</p><p>test'\"</p><ul><li><p>Some <p><i><b>test</b></i> text: < 15 ( less than 15 ) </p></p></li></ul><p></p>";

$xml = new DOMDocument();
libxml_use_internal_errors(true);
$xml->loadHTML($html);
$xp = new DOMXPath($xml);
$tags = $xp->query("//p | //li | //i | //b | //ul | //ol | //li" );
foreach ( $tags as $tag )   {
    echo $tag->tagName.PHP_EOL;
    $content = $xp->query("descendant::text()", $tag );
    foreach ( $content as $element )    {
        if ( $element instanceof  DOMText ) {
            echo "to:".htmlspecialchars($element->wholeText).PHP_EOL;
            $newTextNode = $xml->createTextNode( htmlspecialchars($element->wholeText) );
            $element->parentNode->replaceChild( $newTextNode, $element );
        }
    }
}

echo $xml->saveXML();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM