简体   繁体   English

如何检查字符串是否可以安全地转换为另一个字符集而不会丢失?

[英]How to check if a string can safely be converted in another character set without loss?

Is it possible, prior to converting a string from a charset to another, to know whether this conversion will be lossless? 在将字符串从字符集转换为另一个字符串之前,是否有可能知道此转换是否无损?

If I try to convert an UTF-8 string to latin1, for example, the chars that can't be converted are replaced by ? 例如,如果我尝试将UTF-8字符串转换为latin1,则无法转换的字符将替换为? . Checking for ? 检查? in the result string to find out if the conversion was lossless is obviously not a choice. 在结果字符串中找出转换是否无损是显然不是一个选择。

The only solution I can see right now is to convert back to the original charset, and compare to the original string: 我现在能看到的唯一解决方案是转换回原始字符集,并与原始字符串进行比较:

function canBeSafelyConverted($string, $fromEncoding, $toEncoding)
{
    $encoded = mb_convert_encoding($string, $toEncoding, $fromEncoding);
    $decoded = mb_convert_encoding($encoded, $fromEncoding, $toEncoding);

    return $decoded == $string;
}

This is just a quick&dirty one though, that may come with unexpected behaviours at times, and I guess there might be a cleaner way to do this with mbstring , iconv , or any other library. 这只是一个快速而肮脏的问题,有时可能会出现意外行为,我想可能有更简洁的方法可以使用mbstringiconv或任何其他库。

An alternative way is to set up your own error handler with set_error_handler(). 另一种方法是使用set_error_handler()设置自己的错误处理程序。 If you use iconv() on the string it will throw a notice if it can not be fully converted that you can catch there and react to in your code. 如果你在字符串上使用iconv(),它会抛出一个通知,如果它无法完全转换,你可以捕到它并在你的代码中做出反应。

Or you could just count the number of question marks before and after encoding. 或者您可以只计算编码前后的问号数量。 Or call iconv() with //IGNORE and count the number of characters. 或者使用// IGNORE调用iconv()并计算字符数。

None of the suggestions much more elegant than yours, but gets rid of the double processing. 没有比你更优雅的建议,但摆脱了双重处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM