简体   繁体   中英

How to decode a QByteArray using UTF-8 with Latin-1 fallback

I have a socket connected to an IRC server. I would like to convert received data (QByteArray) to QString. Because on IRC, not everyone is using UTF-8, I want to try to decode the QByteArray using UTF-8:

QString s = QString::fromUtf8(array);

The problem is that Qt silently replaces "bad" characters and always returns a QString. I would like to "try" decoding, and if it can't decode correctly, fallback to latin-1 decoding.

How could I do that ?

Unfortunately it doesn't look like Qt offers a decoding routine that allows its handling of invalid sequences to be configured.

Instead you should be able to do something like the following:

QString s = QString::fromUtf8(array);
if (s.toUtf8() != array) {
  s = QString::fromLatin1(array);
}

Direct conversion between UTF-8 and UTF-16 (ie, no normalization) should be loss-less and perfectly reversible. If converting from UTF-16 to UTF-8 does not produce the original data, then that's because the original data wasn't valid UTF-8.

It is possible, though unlikely in normal circumstances, that text in some other encoding happens to be valid UTF-8 but have different meaning in UTF-8 vs. the correct encoding. Such text will be detected as UTF-8 by this and will not display as intended. The only way to avoid this is to have prior knowledge of the correct encoding, eg via protocol declarations of the correct encoding.


Another option is to use std::wstring_convert , part of the C++11 standard library.

#include <codecvt> // for codecvt_utf8_utf16
#include <locale>  // for wstring_convert

QByteArray array = ...

std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> converter;
QString s;
try {
  std::u16string s16 = converter.from_bytes(array.data(), array.size());

  s = QString::fromUtf16(s16.c_str());

} catch(...) {
  s = QString::fromLatin1(array);
}

Note that the use of fromUtf16 with char16_t depends on this change which may not be included in the version of Qt you're using. Presumably they'll also eventually add something like fromStdU16String() so you can say QString::fromStdU16String(s16) , or maybe add implicit conversions so you can just say s = s16; .

Also note that libstdc++ (the default standard library implementation for gcc) doesn't include this conversion facility yet. Visual Studio 2010 and later has it, and libc++ has it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM