简体   繁体   English

如何将带有特殊字符的字符串拆分为NSMutableArray

[英]How do I split a string with special characters into a NSMutableArray

I'am trying to seperate a string with danish characters into a NSMutableArray. 我试图将带有丹麦字符的字符串分隔到NSMutableArray中。 But something is not working. 但有些东西不起作用。 :( :(

My code: 我的代码:

NSString *danishString = @"æøå";

NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]]; 

for (int i=0; i < [danishString length]; i++) 
{ 
     NSString *ichar = [NSString stringWithFormat:@"%c", [danishString characterAtIndex:i ]]; 
     [characters addObject:ichar]; 
} 

If I do at NSLog on the danishString it works (returns æøå); 如果我在NSLog上使用danishString它可以工作(返回æøå);

But if I do a NSLog on the characters (the array) I get some very stange characters - What is wrong? 但是,如果我对字符(数组)执行NSLog,我会得到一些非常难以理解的字符 - 出了什么问题?

/Morten /莫滕

First of all, your code is incorrect. 首先,您的代码不正确。 characterAtIndex returns unichar , so you should use @"%C" (uppercase) as the format specifier. characterAtIndex返回unichar ,因此您应该使用@"%C" (大写)作为格式说明符。

Even with the correct format specifier, your code is unsafe, and strictly speaking, still incorrect, because not all unicode characters can be represented by a single unichar . 即使使用正确的格式说明符,您的代码也是不安全的,严格来说,仍然不正确,因为并非所有unicode字符都可以由单个unichar表示。 You should always handle unicode strings per substring: 您应该始终处理每个子字符串的unicode字符串:

It's common to think of a string as a sequence of characters, but when working with NSString objects, or with Unicode strings in general, in most cases it is better to deal with substrings rather than with individual characters. 将字符串视为一系列字符是很常见的,但是当使用NSString对象或一般使用Unicode字符串时,在大多数情况下处理子字符串而不是单个字符更好。 The reason for this is that what the user perceives as a character in text may in many cases be represented by multiple characters in the string. 其原因在于,在许多情况下,用户认为文本中的字符可以由字符串中的多个字符表示。

You should definitely read String Programming Guide . 你一定要阅读字符串编程指南

Finally, the correct code for you: 最后,为您准确的代码:

NSString *danishString = @"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]]; 
[danishString enumerateSubstringsInRange:NSMakeRange(0, danishString.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
    [characters addObject:substring];
}];

If with NSLog(@"%@", characters); 如果使用NSLog(@"%@", characters); you see "strange character" of the form "\\Uxxxx", that's correct. 你看到“\\ Uxxxx”形式的“奇怪的角色”,这是正确的。 It's the default stringification behavior of NSArray by description method. 这是description方法的NSArray的默认字符串化行为。 You can print these unicode characters one by one if you want to see the "normal characters": 如果要查看“普通字符”,可以逐个打印这些unicode字符:

for (NSString *c in characters) {
    NSLog(@"%@", c);
}

In your example, ichar isn't type of NSString , but unichar . 在您的示例中, ichar不是NSString类型,而是unichar If you want NSString s try getting a substring instead : 如果您想要NSString尝试获取子字符串:

NSString *danishString = @"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]]; 

for (int i=0; i < [danishString length]; i++) 
{ 
    NSRange r = NSMakeRange(i, 1);
    NSString *ichar = [danishString substringWithRange:r]; 
    [characters addObject:ichar]; 
}

You could do something like the following, which should be fine with Danish characters, but would break down if you have decomposed characters. 您可以执行以下操作,对于丹麦语字符应该没问题,但如果您已经分解了字符,则可能会崩溃。 I suggest reading the String Programming Guide for more information. 我建议阅读字符串编程指南以获取更多信息。

NSString *danishString = @"æøå";
NSMutableArray* characters = [NSMutableArray array];
for( int i = 0; i < [danishString length]; i++ ) {
  NSString* subchar = [danishString substringWithRange:NSMakeRange(i, 1)];
  if( subchar ) [characters addObject:subchar];
}

That would split the string into an array of individual characters, assuming that all the code points were composed characters. 这会将字符串拆分为单个字符数组,假设所有代码点都是由字符组成的。

It is printing the unicode of the characters. 它正在打印角色的unicode。 Anyhow, you can use the unicode (with \\u) anywhere. 无论如何,你可以在任何地方使用unicode(和\\ u)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM