简体   繁体   中英

remove html tags

Currently, I use strip_tags, to remove all html tags from the strings I process. However, I notice lately, that it joins words, which contained in the tags removed ie

$str = "<li>Hello</li><li>world</li>";
$result = strip_tags($str);
echo $result;
(prints HelloWorld)

How can you get around this?

you can play around which Regex Pattern is best and what to replace :)

// ------------------------------------ 

function strip_html_tags($string) { 

    $string = str_replace("\r", ' ', $string); 
    $string = str_replace("\n", ' ', $string); 
    $string = str_replace("\t", ' ', $string); 
##  $string = str_replace("<li>', "\n* ", $string); 

##  $pattern = "/<.*?>/"; 
    $pattern = '/<[^>]*>/'; 

    $string= preg_replace ($pattern, ' ', $string); 

    $string= trim(preg_replace('/ {2,}/', ' ', $string));

return $string; 

}

// ------------------------------------ 

you can also add special replacements like: '<li>' to "\\n* " ... or whatever :)

It all depends on what output you want after stripping HTML tags. For example:

If you want the <li> tags to be converted in a plain list of items, I would suggest you to use str_replace to replace <li> with * and </li> with \\n .

strip_tags 's proposal is to get rid of HTML tags without any other conversion.

This would replace all html tags (anything in the form of < ABC >, in fact, without check if it truly is html) with a whitespace, then replace possible double whitespaces to single whitespaces and remove starting or ending whitespaces.

$str = preg_replace("/<.*?>/", " ", $str);
$str = trim(str_replace("  ", " ", $str));
echo strip_tags( str_replace( '>', '> ', $string ));

这应该完全符合你所寻求的所有情况。

From your code i discover that there was no initial space in between the words Hello Word and you don't expect the strip_tags function to add it for you, so for the strip_tags function to produce exactly what you want, i added a space after the first list tag and the result was Hello world.

You can copy and paste this code and run to see the difference.

    $str = "<li>Hello</li> <li>world</li>";
    $result = strip_tags($str);
    echo $result;
    //Expected result after Execution  is Hello world

You would be better off with htmlentities()

It won't remove the <>, but escape them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM