简体   繁体   中英

Trouble with string encoding and emoji

I've got some trouble to retrieve some text message from my server, especially with the encoding. Messages can be from many languages (so they can have accents, be in japanese,... ) and can include emoji.

I'm retrieving my message with a JSON with some info. Here is some logs example :

(lldb) po dataMessages
<__NSCFArray 0x14ecc7f0>(
{
    author = "User 1";
    text = "Hier, c'\U00c3\U00a9tait incroyable";
},
{
...
}
)

(lldb) po [[dataMessages objectAtIndex:0] objectForKey:@"text"]
Hier, c'était incroyable

I'm able to get the correct text with :

const char *c = [[[dataMessages objectAtIndex:indexPath.row] objectForKey:@"text"] cStringUsingEncoding:NSWindowsCP1252StringEncoding];
NSString *myMessage = [NSString stringWithCString:c encoding:NSUTF8StringEncoding];

However, if the message contains emoji, cStringUsingEncoding: return a NULL value.
I don't have control on my server, so I can't change their encoding before messages are sent to me.

The problem is determining the encoding correctly. Emoji are not part of NSWindowsCP1252StringEncoding so the conversion just fails.

Moreover, you are passing through an unnecessary stage. Do not make an intermediate C string! Just call NSString's initWithData:encoding: .

In your case, calling NSWindowsCP1252StringEncoding was always a mistake; I'm surprised that this worked for any string. C3A9 is Unicode (UTF8). So just call initWithData:encoding: with the UTF8 encoding (NSUTF8StringEncoding) from the get-go and all will be well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM