In php, I need to replace all non-UTF8 characters in a string. However, not by some equivalent (like the iconv
function with //TRANSLIT
) but by some chosen character (like "_"
or "*"
for example).
Typically I want the user to be able to see the position were the invalid characters were found.
I didn't find any functions that do this, so I was going to use:
iconv
with //IGNORE
Do you see a better way to do that, is there some functions in php that can be combined to have this behavior ?
Thanks for you help.
Here are 2 functions to help you achieve something close to what you want :
//reject overly long 2 byte sequences, as well as characters above U+10000 and replace with ?
$some_string = preg_replace('/[\x00-\x08\x10\x0B\x0C\x0E-\x19\x7F]'.
'|[\x00-\x7F][\x80-\xBF]+'.
'|([\xC0\xC1]|[\xF0-\xFF])[\x80-\xBF]*'.
'|[\xC2-\xDF]((?![\x80-\xBF])|[\x80-\xBF]{2,})'.
'|[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/S',
'?', $some_string );
//reject overly long 3 byte sequences and UTF-16 surrogates and replace with ?
$some_string = preg_replace('/\xE0[\x80-\x9F][\x80-\xBF]'.
'|\xED[\xA0-\xBF][\x80-\xBF]/S','?', $some_string );
note that you can change the replacement (which currently is '?' with anything else by changing the string located at preg_replace('blablabla', **'?'**, $some_string)
the original article : http://magp.ie/2011/01/06/remove-non-utf8-characters-from-string-with-php/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.