简体   繁体   中英

PHP: replace invalid characters in utf-8 string in

如何在空白字符的utf-8字符串中替换(在PHP5中使用正则表达式)无效字符?

use iconv

$text = iconv("UTF-8", "UTF-8//IGNORE", $text);

see the manual .

Cheers

With mbstring you can do:

$text = mb_convert_encoding($text, 'UTF-8', 'UTF-8');

Will work as you want (replace invalid characters by whitespaces), but doesn't seem to work if you want to substitute invalid characters with something else, like ? .

See: Replacing invalid UTF-8 characters by question marks, mbstring.substitute_character seems ignored

The iconv was not working my case (as other solutions) so I found mine here in the part for "Character validation":

http://webcollab.sourceforge.net/unicode.html

If you have come across the cursed 'Invalid Character' error while using PHP's XML or JSON parser then you may be interested in this.

Unfortunately, PHP's XML and JSON parsers do not ignore non-UTF8 characters, but rather they stop and throw a rather unhelpful error. I found the below code form net and work excellently for me..

//reject overly long 2 byte sequences, as well as characters above U+10000 and replace with ?
$some_string = preg_replace('/[\x00-\x08\x10\x0B\x0C\x0E-\x19\x7F]'.
 '|[\x00-\x7F][\x80-\xBF]+'.
 '|([\xC0\xC1]|[\xF0-\xFF])[\x80-\xBF]*'.
 '|[\xC2-\xDF]((?![\x80-\xBF])|[\x80-\xBF]{2,})'.
 '|[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/S',
 '?', $some_string );

//reject overly long 3 byte sequences and UTF-16 surrogates and replace with ?
$some_string = preg_replace('/\xE0[\x80-\x9F][\x80-\xBF]'.
 '|\xED[\xA0-\xBF][\x80-\xBF]/S','?', $some_string );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM