简体   繁体   中英

How to discover if a c-string can be encoded to NSString with a given encoding

I am trying to implement code that converts const char * to NSString . I would like to try multiple encodings in a specified order until I find one that works. Unfortunately, all the initWith... methods on NSString say that the results are undefined if the encoding doesn't work.

In particular, (sometimes) I would like to try first to encode as NSMacOSRomanStringEncoding which never seems to fail. Instead it just encodes gobbledygook. Is there some kind of check I can perform ahead of time? (Like canBeConvertedToEncoding but in the other direction?)

Instead of trying encodings one by one until you find a match, consider asking NSString to help you out here by using +[NSString stringEncodingForData:encodingOptions:convertedString:usedLossyConversion:] , which, given string data and some options, may be able to detect the encoding for you, and return it (along with the actual decoded string).

Specifically for your use-case, since you have a list of encodings you'd like to try, the encodingOptions parameter will allow you to pass those encodings in using the NSStringEncodingDetectionSuggestedEncodingsKey .

So, given a C string and some possible encoding options, you might be able to do something like:

NSString *decodeCString(const char *source, NSArray<NSNumber *> *encodings) {
    NSData * const cStringData = [NSData dataWithBytesNoCopy:(void *)source length:strlen(source) freeWhenDone:NO];
    
    NSString *result = nil;
    BOOL usedLossyConversion = NO;
    NSStringEncoding determinedEncoding = [NSString stringEncodingForData:cStringData
                                                          encodingOptions:@{NSStringEncodingDetectionSuggestedEncodingsKey: encodings,
                                                                            NSStringEncodingDetectionUseOnlySuggestedEncodingsKey: @YES}
                                                          convertedString:&result
                                                      usedLossyConversion:&usedLossyConversion];
    
    /* Decide whether to do anything with `usedLossyConversion` and `determinedEncoding. */
    return result;
}

Example usage:

NSString *result = decodeCString("Hello, world!", @[@(NSShiftJISStringEncoding), @(NSMacOSRomanStringEncoding), @(NSASCIIStringEncoding)]);
NSLog(@"%@", result); // => "Hello, world!"

If you don't 100% care about using only the list of encodings you want to try, you can drop the NSStringEncodingDetectionUseOnlySuggestedEncodingsKey option.


One thing to note about the encoding array you pass in: although the documentation doesn't promise that the suggested encodings are attempted in order, spelunking through the disassembly of the ( current ) method implementation shows that the array is enumerated using fast enumeration (ie, in order). I can imagine that this could change in the future (or have been different in the past) so if this is somehow a hard requirement for you, you could theoretically work around it by repeatedly calling +stringEncodingForData:encodingOptions:convertedString:usedLossyConversion: one encoding at a time in order, but this would likely be incredibly expensive given the complexity of this method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM