简体   繁体   中英

Trying to remove script tags in HTML

I am trying to remove script tags from HTML using PHP but it doesn't work if there's HTML inside the javascript.

For example, if the script tags contain something like this:

function tip(content) {
        $('<div id="tip">' + content + '</div>').css

It will stop at </div> and the rest of the script will still be taken into account.

This is what I have been using to remove the script tags:

foreach ($doc->getElementsByTagName('script') as $node)
{
    $node->parentNode->removeChild($node);
}

How about some regex-based pre-processing?

Example input.html :

<html>
  <head>
    <title>My example</title>
  </head>
  <body>
    <h1>Test</h1>
    <div id="foo">&nbsp;</div>
    <script type="text/javascript">
      document.getElementById('foo').innerHTML = '<span style="color:red;">Hello World!</span>';
    </script>
  </body>
</html>

Script tag removing php script:

<?php

    // unformatted source output:
    header("Content-Type: text/plain");

    // read the example input file given above into a string:
    $input = file_get_contents('input.html');

    echo "Before:\r\n";
    echo $input;
    echo "\r\n\r\n-----------------------\r\n\r\n";

    // replace script tags including their contents by ""
    $output = preg_replace("~<script[^<>]*>.*</script>~Uis", "", $input);

    echo "After:\r\n";
    echo $output;
    echo "\r\n\r\n-----------------------\r\n\r\n";

?>

You can use strip_tags function. In which you can allow the HTML attributes which you want allowed.

I think this is 'here and now' problem, and you need no something special. Just do something like this:

$text = file_get_content('index.html');
while(mb_strpos($text, '<script') != false) {
$startPosition = mb_strpos($text, '<script');
$endPosition = mb_strpos($text, '</script>');
$text = mb_substr($text, 0, $startPosition).mb_substr($text, $endPosition + 7, mb_strlen($text));
}
echo $text;

Only set encoding for 'mb_' like functions

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM