简体   繁体   中英

htmlentities() returns empty values despite UTF-8

So I'm trying to escape a string in PHP using htmlentities() .
Problem is, htmlentities returns an empty string.

I'm receiving this string through an html <form> . The page containing the form tag has the following meta tag : <meta charset="utf-8">

My string is encoded in UTF-8, htmlentites() third parameters is 'UTF-8' and I still get an empty string.

Here is my code :

$str = strtolower(trim($str));
var_dump($str, mb_detect_encoding($str), htmlentities($str), htmlentities($str, ENT_COMPAT, 'UTF-8'), htmlentities($str, ENT_COMPAT, 'ISO-8859-1'));

And here is what var_dump displays :

// Original string is é-è
// Expected output is &eacute;-&egrave;
string '�-�' (length=5) // Original string but why is the length 5 ?
string 'UTF-8' (length=5)
string '' (length=0)
string '' (length=0)
string '&atilde;&copy;-&atilde;&uml;' (length=28) // WTF ??

Anyone know where it's coming from ?

Ok I found out what was wrong. strtolower is causing the problem.
Please use mb_strtolower

var_dump($str, mb_detect_encoding($str), htmlentities($str), htmlentities($str, ENT_COMPAT, 'UTF-8'), htmlentities($str, ENT_COMPAT, 'ISO-8859-1'));
$str = trim($str);
var_dump($str, mb_detect_encoding($str), htmlentities($str), htmlentities($str, ENT_COMPAT, 'UTF-8'), htmlentities($str, ENT_COMPAT, 'ISO-8859-1'));
$str = strtolower($str);
var_dump($str, mb_detect_encoding($str), htmlentities($str), htmlentities($str, ENT_COMPAT, 'UTF-8'), htmlentities($str, ENT_COMPAT, 'ISO-8859-1'));

Here is the output :

// raw string é-è
string 'é-è' (length=5)
string 'UTF-8' (length=5)
string '&eacute;-&egrave;' (length=17)
string '&eacute;-&egrave;' (length=17)
string '&Atilde;&copy;-&Atilde;&uml;' (length=28)
// trim('é-è')
string 'é-è' (length=5)
string 'UTF-8' (length=5)
string '&eacute;-&egrave;' (length=17)
string '&eacute;-&egrave;' (length=17)
string '&Atilde;&copy;-&Atilde;&uml;' (length=28)
// strtolower('é-è')
string '�-�' (length=5)
string 'UTF-8' (length=5)
string '' (length=0)
string '' (length=0)
string '&atilde;&copy;-&atilde;&uml;' (length=28)

Somehow, strtolower() seems to work only in 'ISO-8859-1', and as you can see in the var_dumps, it transforms &Atilde; into &atilde;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM