简体   繁体   中英

How to parse HTML with embeded PHP code?

I'm doing some HTML DOM manipulations:

function parse_html($html) {
    $dom->loadHTML($html);
    libxml_clear_errors();

    // Parse DOM 

    return $dom->saveHTML();
}

The problem is my HTML contains some PHP code and some of them is transformed in HTML entities. For example if $html contains this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<?php // lang=es
    $pwd = $parameter['pwd'];
    $url = $parameter['url'];
?>

<p>
    You are now registered. Go to -&gt;
    <a href="<?php echo $url ?>">control panel</a> 
    to change the settings.
</p>

It's transformed in this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><meta http-equiv="content-type" content="text/html; charset=UTF-8"></head>
<body>
<?php // lang=es
    $pwd = $parameter['pwd'];
    $url = $parameter['url'];
?><p> You are now registered. Go to -&gt; <a href="&lt;?php%20echo%20%24url%20?&gt;">control panel</a> to change the settings.
</p>
</body>
</html>

The <?php echo $url ?> is converted in entities, but I cannot use a function like *html_entity_decode* because it will decode also some entities that must remain entities.

How can I parse a DOM that contains PHP code?

when where and how are you building the $html variable? it is at that spot and time where and when you will want to parse the php inside. If you try to spit it out after it will be spit out like just a string and will not be parsed.

To be more clear, build the $html variable with the php included at that time. Or perhaps you are building a template instead. In that case you will do it differently.

In case you are trying to fill in php content after the $html variable has been put in play, you can instead use str_replace() , or some other similar function to some effect.

The solution I've found is to create a couple of functions to encode/decode the PHP strings.

function encode_php($html) {
    return preg_replace_callback('#<\?php.*\?>#imsU', '_encode_php', $html);
}

function _encode_php($matches) {
    return 'PHP_ENCRYPTED_CODE_BEGIN'.base64_encode($matches[0]).'PHP_ENCRYPTED_CODE_END';
}

function decode_php($html) {
    return preg_replace_callback('#PHP_ENCRYPTED_CODE_BEGIN(.*)PHP_ENCRYPTED_CODE_END#imsU', '_decode_php', $html);
}

function _decode_php($matches) {
    return base64_decode($matches[1]);
}

It's important to choose a prefix and a suffix that you are sure don't appear in your files. This solution has been tested with 2500 HTML files and it works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM