简体   繁体   中英

Can't reproduce “iconv(): Detected an illegal character in input string”, but I keep getting on server

For User Agents and Image EXIF data, my system tries to convert any UTF-8 characters, using iconv() .

However, sometimes I get the following error:

PHP Warning [8]: iconv(): Detected an illegal character in input string

For examples like these:

iconv('UTF-8', 'ASCII//TRANSLIT', 'Mozilla/5.0 (iPhone; CPU OS 10_15_5 (Ergänzendes Update) like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/14E304 Safari/605.1.15');

iconv('UTF-8', 'ASCII//TRANSLIT', 'Ïðîãðàììà öèôðîâîé îáðàáîòêè èçîáðàæåíèé êîìïàíèè ACD Systems');

And the result becomes an empty string.

However, when I copy the above and run manually (on the same server), it works ... I get no error , and the characters are converted to "?".

For years that I've been trying many different things, such as different encodings, use "IGNORE" instead of "TRANSLIT", use mb_convert_encoding , etc...
But it's really hard to debug/fix this, if I can't capture the real input that causes the issue, and I don't know what I can do to 'fix' this.

What can I do, so that whatever input is provided to iconv() , any non-ASCII characters will be converted to a question mark, without failing?

Illegal UTF characters can easily arise through mistakes. An example:

$currencies='€$';
$str = "äöü|".substr($currencies,1,1)."|def";
$ascii = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
//ascii = false + Notice: iconv(): Detected an illegal character in input string

It is clear for UTF-8 that mb_substr() must be used, not substr().

With iconv, an IGNORE can be added to TRANSLIT to ignore illegal characters.

$ascii = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $str);
//$ascii: string(11) ""a"o"u||def"

Finding such illegal characters in strings is not easy. Usual debug outputs falsify these characters or ignore them. With such problems I use this special class that can also reproducibly display strings with illegal UTF-8.

debug::writeUni($str);
//Output:\u{e4}\u{f6}\u{fc}|\x82|def

This output can be taken over with copy and paste.

$str2 = "\u{e4}\u{f6}\u{fc}|\x82|def";
var_dump($str === $str2); //bool(true)

Good morning, My problem persists because there are some characters that are not recognized by iconv. I tried several code options from various groups but what actually worked is the following:

//Nota: Conversor de caracteres para UTF8

 public function ConvertToUTF8($text)
{
    $encoding = mb_detect_encoding($text.'x', mb_detect_order(), false);
    if($encoding == "UTF-8")
    {
        //Converte letra a letra
        $i    = 0;
        $conv = '';
        do 
        {
            $letra = substr($text,$i,1);
            $conv .= iconv(mb_detect_encoding($letra, mb_detect_order(), true), "UTF-8//IGNORE", $letra);
            $i ++;
        } while ($i < strlen($text) );
        $text = $conv;
    }
    else if ($encoding == 'ISO-8859-1')
    {
        $text = mb_convert_encoding($text, 'ISO-8859-1', 'UTF-8');
    }
    else if ($encoding == 'ASCII')
    {
        $text = mb_convert_encoding($text, "UTF-8");
    }
    $out = iconv(mb_detect_encoding($text.'x', mb_detect_order(), false), "UTF-8//TRANSLIT//IGNORE", $text);

    return $out;
}//Fim Módulo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM