简体   繁体   中英

Remove <script> and other tags with php

For a small one page CMS I want to replace script and other tags that people could use for bad intentions.

I've tried out strip_tags and preg_replace but it's not working for some reason.

The one page CMS has 6 fields to edit. Those are saved in a plain text file. When I edit one of those, I need it to remove all tags like script , embed , object , <iframe> and others.

I've checked out HTML Purifier , but I don't get it how this should work as I'm not well known with php. Looks a bit too big for my needs I guess.

This is the code (here I try to remove tags from the script tag from text area named newscontent ):

<?php
if (isset($_POST['edit'])) {

    $newscontent = preg_replace('/<script.+?<\/script>/im', '', $newscontent);

    if (file_put_contents('title.txt',          utf8_encode($_POST['title']))       !== FALSE &&
        file_put_contents('subtitle.txt',       utf8_encode($_POST['subtitle']))    !== FALSE &&
        file_put_contents('datum.txt',          utf8_encode($_POST['datum']))       !== FALSE &&
        file_put_contents('time.txt',           utf8_encode($_POST['time']))        !== FALSE &&
        file_put_contents('timemin.txt',        utf8_encode($_POST['timemin']))     !== FALSE &&
        file_put_contents('newscontent.txt',    utf8_encode($_POST['newscontent'])) !== FALSE
    )
        echo '<p class="succes">Your changes are saved</p>', "\n";
}
$title          = utf8_decode(file_get_contents('title.txt'));
$subtitle       = utf8_decode(file_get_contents('subtitle.txt'));
$datum          = utf8_decode(file_get_contents('datum.txt'));
$time           = utf8_decode(file_get_contents('time.txt'));
$timemin        = utf8_decode(file_get_contents('timemin.txt'));
$newscontent    = utf8_decode(file_get_contents('newscontent.txt'));
?>

Your code doesn't work because you are performing the replacement on the variable $newscontent , but writing $_POST['newscontent'] to the file. I guess you have register globals switched on (which is bad) or this would generate an error.

I would recommend you persevere with HTMLPurifier. There are many, many bad things people could add to text if they have 'bad intentions', and your approach does not even scratch the surface. For example, if you were to fix your code, it doesn't prevent people adding something like this:

<img src="http://www.google.com/logo.gif" onload="javascript:bad stuff here" />

not to mention the complications of different character sets.

<是正则表达式中的一个特殊字符,你需要逃避它。

    $newscontent = preg_replace('/\<(script|object|embed).+?\<\/\1\>/im', '', $newscontent);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM