简体   繁体   English

获取NSString中所有大写字母的NSRange对象数组的最快方法?

[英]Fastest way to get array of NSRange objects for all uppercase letters in an NSString?

I need NSRange objects for the position of each uppercase letter in a given NSString for input into a method for a custom attributed string class. 我需要NSRange对象作为给定NSString中每个大写字母的位置,以输入自定义属性字符串类的方法。

There are of course quite a few ways to accomplish this such as rangeOfString:options: with NSRegularExpressionSearch or using RegexKitLite to get each match separately while walking the string. 当然有很多方法可以实现这一点,例如rangeOfString:options:使用NSRegularExpressionSearch或使用RegexKitLite在遍历字符串时单独获取每个匹配。

What would be the fastest performing approach to accomplish this task? 完成此任务的最快表现方法是什么?

The simplest way is probably to use -rangeOfCharacterFromSet:options:range: with [NSCharacterSet uppercaseLetterCharacterSet] . 最简单的方法可能是使用-rangeOfCharacterFromSet:options:range: with [NSCharacterSet uppercaseLetterCharacterSet] By modifying the range to search over with each call, you can find all of the uppercase letters pretty easily. 通过修改每次调用搜索的范围,您可以非常轻松地找到所有大写字母。 Something like the following will work to give you an NSArray of all ranges (encoded as NSValues): 类似下面的内容将为您提供所有范围的NSArray(编码为NSValues):

- (NSArray *)rangesOfUppercaseLettersInString:(NSString *)str {
    NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
    NSMutableArray *results = [NSMutableArray array];
    NSRange searchRange = NSMakeRange(0, [str length]);
    NSRange range;
    while ((range = [str rangeOfCharacterFromSet:cs options:0 range:searchRange]).location != NSNotFound) {
        [results addObject:[NSValue valueWithRange:range]];
        searchRange = NSMakeRange(NSMaxRange(range), [str length] - NSMaxRange(range));
    }
    return results;
}

Note, this will not coalesce adjacent ranges into a single range, but that's easy enough to add. 请注意,这不会将相邻范围合并为单个范围,但这很容易添加。

Here's an alternative solution based on NSScanner: 这是基于NSScanner的替代解决方案:

- (NSArray *)rangesOfUppercaseLettersInString:(NSString *)str {
    NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
    NSMutableArray *results = [NSMutableArray array];
    NSScanner *scanner = [NSScanner scannerWithString:str];
    while (![scanner isAtEnd]) {
        [scanner scanUpToCharactersFromSet:cs intoString:NULL]; // skip non-uppercase characters
        NSString *temp;
        NSUInteger location = [scanner scanLocation];
        if ([scanner scanCharactersFromSet:cs intoString:&temp]) {
            // found one (or more) uppercase characters
            NSRange range = NSMakeRange(location, [temp length]);
            [results addObject:[NSValue valueWithRange:range]];
        }
    }
    return results;
}

Unlike the last, this one does coalesce adjacent uppercase characters into a single range. 与上一个不同,这个将相邻的大写字符合并为一个范围。

Edit : If you're looking for absolute speed, this one will likely be the fastest of the 3 presented here, while still preserving correct unicode support (note, I have not tried compiling this): 编辑 :如果你正在寻找绝对速度,这个可能是这里提出的3中最快的,同时仍然保持正确的unicode支持(注意,我还没有尝试编译这个):

// returns a pointer to an array of NSRanges, and fills in count with the number of ranges
// the buffer is autoreleased
- (NSRange *)rangesOfUppercaseLettersInString:(NSString *)string count:(NSUInteger *)count {
    NSMutableData *data = [NSMutableData data];
    NSUInteger numRanges = 0;
    NSUInteger length = [string length];
    unichar *buffer = malloc(sizeof(unichar) * length);
    [string getCharacters:buffer range:NSMakeRange(0, length)];
    NSCharacterSet *cs = [NSCharacterSet uppercaseLetterCharacterSet];
    NSRange range = {NSNotFound, 0};
    for (NSUInteger i = 0; i < length; i++) {
        if ([cs characterIsMember:buffer[i]]) {
            if (range.location == NSNotFound) {
                range = (NSRange){i, 0};
            }
            range.length++;
        } else if (range.location != NSNotFound) {
            [data appendBytes:&range length:sizeof(range)];
            numRanges++;
            range = (NSRange){NSNotFound, 0};
        }
    }
    if (range.location != NSNotFound) {
        [data appendBytes:&range length:sizeof(range)];
        numRanges++;
    }
    if (count) *count = numRanges;
    return [data bytes];
}

Using RegexKitLite 4.0+ with a runtime that supports Blocks, this can be quite zippy: 使用RegexKitLite 4.0+和支持Blocks的运行时,这可能非常有趣:

NSString *string = @"A simple String to TEST for Upper Case Letters.";
NSString *regex = @"\\p{Lu}";

[string enumerateStringsMatchedByRegex:regex options:RKLNoOptions inRange:NSMakeRange(0UL, [string length]) error:NULL enumerationOptions:RKLRegexEnumerationCapturedStringsNotRequired usingBlock:^(NSInteger captureCount, NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount], volatile BOOL * const stop) {
  NSLog(@"Range: %@", NSStringFromRange(capturedRanges[0]));
}];

The regex \\p{Lu} says "Match all characters with the Unicode property of 'Letter' that are also 'Upper Case'". 正则表达式\\p{Lu}表示“将所有字符与'Letter'的Unicode属性匹配,也是'大写''。

The option RKLRegexEnumerationCapturedStringsNotRequired tells RegexKitLite that it shouldn't create NSString objects and pass them via capturedStrings[] . 选项RKLRegexEnumerationCapturedStringsNotRequired告诉RegexKitLite它不应该创建NSString对象并通过capturedStrings[]传递它们。 This saves quite a bit of time and memory. 这节省了相当多的时间和内存。 The only thing that gets passed to the block is the NSRange values for the match via capturedRanges[] . 唯一传递给块的是通过capturedRanges[]进行匹配的NSRange值。

There are two main parts to this, the first is the RegexKitLite method: 这有两个主要部分,第一部分是RegexKitLite方法:

[string enumerateStringsMatchedByRegex:regex
                               options:RKLNoOptions
                               inRange:NSMakeRange(0UL, [string length])
                                 error:NULL
                    enumerationOptions:RKLRegexEnumerationCapturedStringsNotRequired
                            usingBlock:/* ... */
];

... and the second is the Block that is passed as an argument to that method: ...第二个是作为该方法的参数传递的块:

^(NSInteger captureCount,
  NSString * const capturedStrings[captureCount],
  const NSRange capturedRanges[captureCount],
  volatile BOOL * const stop) { /* ... */ }

It somewhat depends on the size of the string, but the absolute fastest way I can think of (note: internationalization safety not guaranteed, or even expected! Does the concept of uppercase even apply in say, Japanese?) is: 它在某种程度上取决于字符串的大小,但是我能想到的绝对最快的方式(注意:国际化安全性无法保证,甚至没有预期!大写的概念是否适用于日语?)是:

1) Get a pointer to a raw C string of the string, preferably in a stack buffer if it's small enough. 1)获取指向字符串的原始C字符串的指针,如果它足够小,最好在堆栈缓冲区中。 CFString has functions for this. CFString具有此功能。 Read the comments in CFString.h. 阅读CFString.h中的注释。

2) malloc() a buffer big enough to hold one NSRange per character in the string. 2)malloc()一个足够大的缓冲区,可以在字符串中为每个字符保存一个NSRange。

3) Something like this (completely untested, written into this text field, pardon mistakes and typos) 3)像这样的东西(完全未经测试,写入此文本字段,原谅错误和拼写错误)

NSRange *bufferCursor = rangeBuffer; 
NSRange range = {NSNotFound, 0}; 
for (int idx = 0; idx < numBytes; ++idx) { 
    if (isupper(buffer[idx])) { 
        if (range.length > 0) { //extend a range, we found more than one uppercase letter in a row
            range.length++;
        } else { //begin a range
            range.location = idx; 
            range.length = 1;
        }
    }
    else if (range.location != NSNotFound) { //end a range, we hit a lowercase letter
        *bufferCursor = range; 
        bufferCursor++;
        range.location = NSNotFound;
    }
}

4) realloc() the range buffer back down to the size you actually used (might need to keep a count of ranges begun to do that) 4)realloc()将范围缓冲区调回到你实际使用的大小(可能需要保持开始执行该范围的计数)

a function such as isupper * in conjunction with -[NSString characterAtIndex:] will be plenty fast. isupper *这样的函数与-[NSString characterAtIndex:]相结合将会很快。

*isupper is an example - it may or may not be appropriate for your input. * isupper就是一个例子 - 它可能适合您的输入,也可能不适合您的输入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM