简体   繁体   English

检查Qt中的UTF-8字符串是否有效

[英]Check if UTF-8 string is valid in Qt

In Qt, is there a way to check if a byte array is a valid UTF-8 sequence? 在Qt中,有没有办法检查字节数组是否是有效的UTF-8序列?

It seems that QString::fromUtf8() silently suppresses or replaces invalid sequences, without notifying the caller that there were any. 似乎QString :: fromUtf8()默默地抑制或替换无效序列,而不通知调用者有任何序列。 This is from its documentation: 这来自其文档:

However, invalid sequences are possible with UTF-8 and, if any such are found, they will be replaced with one or more "replacement characters", or suppressed. 但是,UTF-8无效序列是可能的,如果发现任何此类序列,它们将被一个或多个“替换字符”替换或被抑制。

Try with QTextCodec::toUnicode and passing a ConverterState instance. 尝试使用QTextCodec :: toUnicode并传递ConverterState实例。 ConverterState has members like invalidChars . ConverterState有像invalidChars这样的成员。 They are not documented via doxygen though, but I assume them to be public API, as they are mentioned in the QTextCodec documentation. 它们不是通过doxygen记录的,但我认为它们是公共API,因为它们在QTextCodec文档中提到过。

Sample code: 示例代码:

QTextCodec::ConverterState state;
QTextCodec *codec = QTextCodec::codecForName("UTF-8");
const QString text = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
if (state.invalidChars > 0) {
    qDebug() << "Not a valid UTF-8 sequence.";
}

The ConverterState way, which has already been reported here by Frank Osterfeld, works even if the text hasn't got a "BOM (Byte Order Mark)" (*). ConverterState方式已经由Frank Osterfeld 在这里报告,即使文本没有“BOM(字节顺序标记)”(*)也可以工作。

(*) Unlike QTextCodec::codecForUtfText() , which needs a BOM in the text in order to know that it's in Utf-8. (*)与QTextCodec::codecForUtfText() ,它需要文本中的BOM才能知道它在Utf-8中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM