简体   繁体   English

尽管UTF-8,htmlentities()返回空值

[英]htmlentities() returns empty values despite UTF-8

So I'm trying to escape a string in PHP using htmlentities() . 所以我试图使用htmlentities()在PHP中转义一个字符串。
Problem is, htmlentities returns an empty string. 问题是,htmlentities返回一个空字符串。

I'm receiving this string through an html <form> . 我正在通过html <form>接收此字符串。 The page containing the form tag has the following meta tag : <meta charset="utf-8"> 包含form标记的页面具有以下meta标记: <meta charset="utf-8">

My string is encoded in UTF-8, htmlentites() third parameters is 'UTF-8' and I still get an empty string. 我的字符串以UTF-8编码, htmlentites()第三个参数是'UTF-8' ,但我仍然得到一个空字符串。

Here is my code : 这是我的代码:

$str = strtolower(trim($str));
var_dump($str, mb_detect_encoding($str), htmlentities($str), htmlentities($str, ENT_COMPAT, 'UTF-8'), htmlentities($str, ENT_COMPAT, 'ISO-8859-1'));

And here is what var_dump displays : 这是var_dump显示的内容:

// Original string is é-è
// Expected output is &eacute;-&egrave;
string '�-�' (length=5) // Original string but why is the length 5 ?
string 'UTF-8' (length=5)
string '' (length=0)
string '' (length=0)
string '&atilde;&copy;-&atilde;&uml;' (length=28) // WTF ??

Anyone know where it's coming from ? 有人知道它从哪里来吗?

Ok I found out what was wrong. 好的,我发现了问题所在。 strtolower is causing the problem. strtolower引起了问题。
Please use mb_strtolower 请使用mb_strtolower

var_dump($str, mb_detect_encoding($str), htmlentities($str), htmlentities($str, ENT_COMPAT, 'UTF-8'), htmlentities($str, ENT_COMPAT, 'ISO-8859-1'));
$str = trim($str);
var_dump($str, mb_detect_encoding($str), htmlentities($str), htmlentities($str, ENT_COMPAT, 'UTF-8'), htmlentities($str, ENT_COMPAT, 'ISO-8859-1'));
$str = strtolower($str);
var_dump($str, mb_detect_encoding($str), htmlentities($str), htmlentities($str, ENT_COMPAT, 'UTF-8'), htmlentities($str, ENT_COMPAT, 'ISO-8859-1'));

Here is the output : 这是输出:

// raw string é-è
string 'é-è' (length=5)
string 'UTF-8' (length=5)
string '&eacute;-&egrave;' (length=17)
string '&eacute;-&egrave;' (length=17)
string '&Atilde;&copy;-&Atilde;&uml;' (length=28)
// trim('é-è')
string 'é-è' (length=5)
string 'UTF-8' (length=5)
string '&eacute;-&egrave;' (length=17)
string '&eacute;-&egrave;' (length=17)
string '&Atilde;&copy;-&Atilde;&uml;' (length=28)
// strtolower('é-è')
string '�-�' (length=5)
string 'UTF-8' (length=5)
string '' (length=0)
string '' (length=0)
string '&atilde;&copy;-&atilde;&uml;' (length=28)

Somehow, strtolower() seems to work only in 'ISO-8859-1', and as you can see in the var_dumps, it transforms &Atilde; 不知何故, strtolower()似乎仅在“ ISO-8859-1”中有效,并且如您在var_dumps中所见,它转换&Atilde; into &atilde; 进入&atilde;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM