简体   繁体   中英

In Qt, how do I convert the Unicode codepoint U+1F64B to a QString holding its equivalent character “🙋”?

Background:

I am making a hash that will allow you to lookup the description you see below by feeding it a QString containing its character.

角色图示例

I got a full list of the relevant data, looking something like this:

QHash<QString, QString> lookupCharacterDescription;
...
lookupCharacterDescription.insert("003F","QUESTION MARK");
lookupCharacterDescription.insert("0040","COMMERCIAL AT");
lookupCharacterDescription.insert("0041","LATIN CAPITAL LETTER A");
lookupCharacterDescription.insert("0042","LATIN CAPITAL LETTER B");
...
lookupCharacterDescription.insert("1F648","SEE-NO-EVIL MONKEY");
lookupCharacterDescription.insert("1F649","HEAR-NO-EVIL MONKEY");
lookupCharacterDescription.insert("1F64A","SPEAK-NO-EVIL MONKEY");
lookupCharacterDescription.insert("1F64B","HAPPY PERSON RAISING ONE HAND");
...
lookupCharacterDescription.insert("FFFD","REPLACEMENT CHARACTER");
lookupCharacterDescription.insert("FFFE","<not a character>");
lookupCharacterDescription.insert("FFFF","<not a character>");
lookupCharacterDescription.insert("FFFFE","<not a character>");
lookupCharacterDescription.insert("FFFFF","<not a character>");

Now obviously "1F64B" needs to be wrapped in something here. I have tried playing around with things like 0x1F64B as a QChar, but I am honestly groping in the dark here. I could make it work with the lower values like the Latin Letters, but it fails with the 5 character addresses.

Questions:

  • How do I classify 1F64B ?
  • Is this considered UTF-32?
  • What can I wrap this value "1F64B" in to produce the QString("🙋")?
  • Will the wrappings also work for the lower values?

When you use QString(0x1F64B) it'll call QString::QString(QChar ch) . Since QChar is a 16-bit type, it'll truncate the value to 0xF64B and you get an invalid character since that code point is currently unassigned. I'm pretty sure you'll get an out-of-range warning at that line. You can see the value F64B easily in the character if you zoom in or use a hex editor. Since there's no way for 0x1F64B to fit into a single 16-bit QChar and must be represented by a surrogate pair, you can't initialize the string that way.

OTOH QString("🙋") works since it's constructing the string from another string . You must construct the string with a string like that, or manually by assigning the UTF-8/16 code units.

Is this considered UTF-32?

No. UTF-32 is a Unicode encoding that uses 32 bits for a code unit. You only have QString and not a bare byte array, so you don't need to care about its underlying encoding (which is actually UTF-16)

What can I wrap this value "1F64B" in to produce the QString("🙋")?

You shouldn't deal with the numeric values as string. Store it as a numeric type instead

QHash<qint32, QString> lookupCharacterDescription;
lookupCharacterDescription.insert(0x1F64B, "HAPPY PERSON RAISING ONE HAND");

and then to make a string that contains the character at code point 0x1F64B use

uint cp = 0x1F64B;
QString mystr = QString::fromUcs4(&cp, 1);

Will the wrappings also work for the lower values?

Yes, since UCS4, AKA UTF-32, can store any possible Unicode characters

Alternatively you can construct the character from UTF-16 or UTF-8. U+1F64B is encoded in UTF-16 as D83D DE4B , or as F0 9F 99 8B in UTF-8, therefore you can use any of the below

QChar utf16[2] = { 0xD38D, 0xDE4B };
str1 = QString(utf16, 2);
char* utf8[4] = { 0xF0, 0x9F, 0x99, 0x8B };
str2 = QString::fromUtf8(utf8, 4);

If you want to include the string in its literal form in source code then either of the following will work

str1 = QString::fromWCharArray(L"\xD83D\xDE4B");
str2 = QString::fromUtf8("\xF0\x9F\x99\x8B");

If you have C++11 support then simply use the prefix u8 , u and U for UTF-8, UTF-16 and UTF-32 respectively like

u8"🙋"
u"🙋"
U"🙋"
u8"\U0001F64B"
u"\U0001F64B"
u"\uD83D\uDE4B"
U"\U0001F64B" 

Mandatory article to understand text and encodings: There Ain't No Such Thing as Plain Text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM