简体   繁体   中英

PHP htmlentities and htmlspecialchars are breaking my strings

I have a description field in my application, and if I include a quote like this: ' it breaks everything. I was using htmlentities() on the entire description field, and so I tried htmlspecialchars() but it breaks as well.

In the screenshot below, I sent the string "I'd like this to work" and got the follow mess

这是通过htmlentities运行后我的字符串的样子 This is what my string looks like after being run through htmlentities

I've had this issue in the past, but I'm not sure how to fix it.

I fixed the problem by changing my code from

$text = htmlentities( $text, ENT_QUOTES );

to:

$text = htmlentities( $text, ENT_QUOTES, 'utf-8' );

Which is weird, because PHP lists the default setting as utf-8.

If I just have to replace certain characters I'll sometimes just create a simple find and replace script.

<?php
  $bad = array('’', '&'); // add whatever you don't want here
  $good = array('&rsquo;', '&amp;'); // replace it here
  $description_field = str_replace($bad, $good, $description_field);
?>

I'm pretty sure htmlentities and htmlspecialchars are not UTF-8-safe functions. They see the first byte of a Unicode character as an HTML entity to encode, then when it comes time for the browser to read the supposedly UTF-8 content, it sees an HTML entity followed by two invalid bytes left over.

You might need to look into functions like mb_ereg_replace and manually replace unsafe characters:

$output = mb_ereg_replace("/</","&lt;",$input);

That's all you really need to make a string HTML-safe. I can't seem to find a multibyte-safe str_replace , but this works just as well, and it will ensure you never have problems with UTF-8 characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM