简体   繁体   English

为什么iconv会产生非法字符错误?

[英]Why is iconv generating an illegal character error?

I'm trying to iron out the warnings and notices from a script. 我正试图从剧本中删除警告和通知。 The script includes the following: 该脚本包括以下内容:

$clean_string = iconv('UTF-8', 'UTF-8//IGNORE', $supplier.' => '.$product_name);

As I understand it, the purpose of this line, as intended by the original author of the script, is to remove non-UTF-8 characters from the string, but obviously any non-UTF-8 characters in the input will cause iconv to throw an illegal character warning. 据我了解,该行的目的是按照脚本原作者的意图,从字符串中删除非UTF-8字符,但显然输入中的任何非UTF-8字符都会导致iconv抛出非法的人物警告。

To solve this, my idea was to do something like the following: 为了解决这个问题,我的想法是做类似以下的事情:

$clean_string = iconv(mb_detect_encoding($supplier.' => '.$product_name), 'UTF-8//IGNORE', $supplier.' => '.$product_name);

Oddly however, mb_detect_encoding() is returning UTF-8 as the detected encoding! 但奇怪的是,mb_detect_encoding()返回UTF-8作为检测到的编码!

The letter e with an accent ( é ) is an example of a character that causes this behaviour. 带有重音( é )的字母e是导致此行为的字符的示例。

I realise I'm mixing multibyte libraries between detection and conversion, but I couldn't find an encoding detection function in the iconv library. 我意识到我在检测和转换之间混合使用多字节库,但我在iconv库中找不到编码检测功能。

I've considered using the mb_convert_encoding() function to clean the string up into UTF-8, but the PHP documentation isn't clear what happens to characters that cannot be represented. 我已经考虑过使用mb_convert_encoding()函数将字符串清理成UTF-8,但PHP文档并不清楚无法表示的字符会发生什么。

I am using PHP 5.2.17, and with the glibc iconv implementation, library version 2.5. 我使用PHP 5.2.17,并使用glibc iconv实现,库版本2.5。

Can anyone offer any suggestions on how to clean the string into UTF-8, or insight into why this behaviour occurs? 任何人都可以提供有关如何将字符串清理为UTF-8的任何建议,或者了解为什么会出现这种情况?

Your example: 你的例子:

$string     = $supplier . ' => ' . $product_name;
$stringUtf8 = iconv('UTF-8', 'UTF-8//IGNORE', $string);

and using PHP 5.2 might work for you. 并使用PHP 5.2可能适合您。 In later PHP versions, if the input is not precisely UTF-8, incov will drop the string (you will get an empty string). 在以后的PHP版本中,如果输入不是精确的UTF-8,incov将丢弃该字符串(您将获得一个空字符串)。 That so far as a note to you, you might not be aware of it. 到目前为止,你可能没有注意到它。

Then you try with mb_detect_encoding Docs to find out about the original encoding: 然后,您尝试使用mb_detect_encoding 文档来查找原始编码:

$string     = $supplier . ' => ' . $product_name;
$encoding   = mb_detect_encoding($string);
$stringUtf8 = iconv($encoding, 'UTF-8//IGNORE', $string);

As I already linked in a comment, mb_detect_encoding is doing some magic and can not work. 正如我已在评论中链接的那样, mb_detect_encoding正在做一些魔术并且无法正常工作。 It tries to help you, however, it can not detect the encoding very good. 它试图帮助你,但它无法检测到编码非常好。 This is by matters of the subject. 这是主题的问题。 You can try to set the strict mode to true: 您可以尝试将严格模式设置为true:

$order      = mb_detect_order();
$encoding   = mb_detect_encoding($string, $order, true);
if (FALSE === $encoding) {
    throw new UnexpectedValueException(
        sprintf(
            'Unable to detect input encoding with mb_detect_encoding, order was: %s'
            , print_r($order, true)
        )
     );
}

Next to that you might also need to translate the names of the encoding Docs (and/or validate against supported encoding) between the two libraries (iconv and multi byte strings). 接下来,您可能还需要在两个库(iconv和多字节字符串)之间转换编码Docs的名称 (和/或对支持的编码进行验证)。

Hope this helps so that you at least do better understand why some things might not work and how you can better find the error-cases and filter the input then with the standard PHP extensions. 希望这有助于您至少更好地理解为什么有些东西可能不起作用以及如何更好地找到错误情况并使用标准PHP扩展来过滤输入。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 PHP:为什么iconv中的任何非拉丁字符都会给我“非法字符”错误? - PHP: Why any non-latin char in iconv gives me “illegal character” error? iconv UTF-8//IGNORE 仍然产生“非法字符”错误 - iconv UTF-8//IGNORE still produces “illegal character” error iconv - 在输入字符串中检测到非法字符 - iconv - Detected an illegal character in input string iconv():在输入字符串中检测到非法字符 - iconv(): Detected an illegal character in input string iconv在输入字符串中检测到非法字符 - iconv Detected an illegal character in input string 创建标题Slug时出错注意:iconv():在以下位置的输入字符串中检测到非法字符 - Error while Creating title Slug Notice: iconv(): Detected an illegal character in input string in iconv-注意:iconv()[function.iconv]:在输入字符串中检测到非法字符 - iconv - Notice: iconv() [function.iconv]: Detected an illegal character in input string iconv():在 FPDF 的输入字符串中检测到土耳其语字符的非法字符 - iconv(): Detected an illegal character in input string in FPDF for Turkish characters 带有ascii的iconv //传输触发ErrorException:“ iconv():在输入字符串中检测到非法字符” - iconv with ascii // transit triggers ErrorException: “iconv(): Detected an illegal character in input string” 使用iconv()检查无效的UTF-8字符:在输入字符串中检测到非法字符 - Using iconv() to check for invalid UTF-8 characters: Detected an illegal character in input string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM