检查Qt中的UTF-8字符串是否有效

Question

In Qt, is there a way to check if a byte array is a valid UTF-8 sequence? 在Qt中，有没有办法检查字节数组是否是有效的UTF-8序列？

It seems that QString::fromUtf8() silently suppresses or replaces invalid sequences, without notifying the caller that there were any. 似乎QString :: fromUtf8（）默默地抑制或替换无效序列，而不通知调用者有任何序列。 This is from its documentation: 这来自其文档：

However, invalid sequences are possible with UTF-8 and, if any such are found, they will be replaced with one or more "replacement characters", or suppressed. 但是，UTF-8无效序列是可能的，如果发现任何此类序列，它们将被一个或多个“替换字符”替换或被抑制。

Answer 1

Try with QTextCodec::toUnicode and passing a ConverterState instance. 尝试使用QTextCodec :: toUnicode并传递ConverterState实例。 ConverterState has members like invalidChars . ConverterState有像invalidChars这样的成员。 They are not documented via doxygen though, but I assume them to be public API, as they are mentioned in the QTextCodec documentation. 它们不是通过doxygen记录的，但我认为它们是公共API，因为它们在QTextCodec文档中提到过。

Sample code: 示例代码：

QTextCodec::ConverterState state;
QTextCodec *codec = QTextCodec::codecForName("UTF-8");
const QString text = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
if (state.invalidChars > 0) {
    qDebug() << "Not a valid UTF-8 sequence.";
}

Answer 2

The ConverterState way, which has already been reported here by Frank Osterfeld, works even if the text hasn't got a "BOM (Byte Order Mark)" (*). ConverterState方式已经由Frank Osterfeld 在这里报告，即使文本没有“BOM（字节顺序标记）”（*）也可以工作。

(*) Unlike QTextCodec::codecForUtfText() , which needs a BOM in the text in order to know that it's in Utf-8. （*）与QTextCodec::codecForUtfText() ，它需要文本中的BOM才能知道它在Utf-8中。

检查Qt中的UTF-8字符串是否有效

问题描述

2 个解决方案

解决方案1
19 已采纳 2013-08-14 09:46:55

解决方案2
2 2013-12-09 00:09:50

检查Qt中的UTF-8字符串是否有效

问题描述

2 个解决方案

解决方案1 19 已采纳 2013-08-14 09:46:55

解决方案2 2 2013-12-09 00:09:50

解决方案1
19 已采纳 2013-08-14 09:46:55

解决方案2
2 2013-12-09 00:09:50