简体   繁体   中英

Convert html entities to UTF-8, but keep existing UTF-8

I want to convert html entities to UTF-8, but mb_convert_encoding destroys already UTF-8 encoded characters. Whats the correct way?

$text = "äöü ä ö ü ß";
var_dump(mb_convert_encoding($text, 'UTF-8', 'HTML-ENTITIES'));
// string(24) "äöü ä ö ü ß"

mb_convert_encoding() isn't the correct function for what you're trying to achieve: you should really be using html_entity_decode() instead, because it will only convert the actual html entities to UTF-8, and won't affect the existing UTF-8 characters in the string.

$text = "äöü ä ö ü ß";
var_dump(html_entity_decode($text, ENT_COMPAT | ENT_HTML401, 'UTF-8'));

which gives

string(18) "äöü ä ö ü ß"

Demo

In my localhost I get string(18) "äöü ä ö ü ß" .

I think it's something related with your page encoding. Edit the file with Notepad++ and from the toolbar go to encoding and change to 'Encode in ANSI'. If it doesn't work then try with 'Encode in UTF-8 without BOM'.

and if that still isn't working try this

html_entity_decode($html, ENT_QUOTES, 'cp1252');

This is what was needed on a Windows IIS system for things to start working correctly. see source

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM