简体   繁体   English

在 Swift 中将文本限制为一定数量的单词

[英]Limit text to a certain number of words in Swift

In a mobile App I use an API that can only handle about 300 words.在移动应用程序中,我使用的 API 只能处理大约 300 个单词。 How can I trimm a string in Swift so that it doesn't contain more words?如何在 Swift 中修剪字符串以使其不包含更多单词?

The native .trimmingCharacters(in: CharacterSet) does not seem to be able to do this as it is intended to trimm certain characters.本机.trimmingCharacters(in: CharacterSet)似乎无法做到这一点,因为它旨在修剪某些字符。

There is no off-the shelf way to limit the number of words in a string.没有现成的方法来限制字符串中的单词数。

If you look at this post , it documents using the method enumerateSubstrings(in: Range) and setting an option of .byWords.如果您查看这篇文章,它会记录使用方法enumerateSubstrings(in: Range)并设置 .byWords 选项。 It looks like it returns an array of Range values.看起来它返回一个Range值数组。

You could use that to create an extension on String that would return the first X words of that string:您可以使用它在 String 上创建一个扩展,该扩展将返回该字符串的前 X 个单词:

extension String {
    func firstXWords(_ wordCount: Int) -> Substring {
        var ranges: [Range<String.Index>] = []
        self.enumerateSubstrings(in: self.startIndex..., options: .byWords) { _, range, _, _ in
            ranges.append(range)
        }
        if ranges.count > wordCount - 1 {
            return self[self.startIndex..<ranges[wordCount - 1].upperBound]
        } else {
            return self[self.startIndex..<self.endIndex]
        }
    }
}

If we then run the code:如果我们然后运行代码:

let sentence = "I want to an algorithm that could help find out how many words are there in a string separated by space or comma or some character. And then append each word separated by a character to an array which could be added up later I'm making an average calculator so I want the total count of data and then add up all the words. By words I mean the numbers separated by a character, preferably space Thanks in advance"

print(sentence.firstXWords(10))

The output is:输出是:

I want to an algorithm that could help find out我想要一个可以帮助找出答案的算法

Using enumerateSubstrings(in: Range) is going to give much better results than splitting your string using spaces, since there are a lot more separators than just spaces in normal text (newlines, commas, colons, em spaces, etc.) It will also work for languages like Japanese and Chinese that often don't have spaces between words.与使用空格分割字符串相比,使用enumerateSubstrings(in: Range)会得到更好的结果,因为在普通文本(换行符、逗号、冒号、em 空格等)中有更多的分隔符,而不仅仅是空格。适用于日语和中文等单词之间通常没有空格的语言。

You might be able to rewrite the function to terminate the enumeration of the string as soon as it reaches the desired number of words.您也许可以重写该函数以在字符串达到所需的单词数时立即终止该字符串的枚举。 If you want a small percentage of the words in a very long string that would make it significantly faster (the code above should have O(n) performance, although I haven't dug deeply enough to be sure of that. I also couldn't figure out how to terminate the enumerateSubstrings() function early, although I didn't try that hard.)如果你想要一个很长的字符串中的一小部分单词,这会使其速度明显加快(上面的代码应该具有O(n)性能,尽管我还没有深入挖掘以确保这一点。我也不能” t 弄清楚如何尽早终止enumerateSubstrings()函数,尽管我没有那么努力。)

Leo Dabus provided an improved version of my function. Leo Dabus 提供了我的函数的改进版本。 It extends StringProtocol rather than String, which means it can work on substrings.它扩展了 StringProtocol 而不是 String,这意味着它可以处理子字符串。 Plus, it stops once it hits your desired word count, so it will be much faster for finding the first few words of very long strings:此外,一旦达到您想要的字数,它就会停止,因此查找很长字符串的前几个词会快得多:

extension StringProtocol {
    func firstXWords(_ n: Int) -> SubSequence {
        var endIndex = self.endIndex
        var words = 0
        enumerateSubstrings(in: startIndex..., options: .byWords) { _, range, _, stop in
            words += 1
            if words == n {
                stop = true
                endIndex = range.upperBound
            }
        }
        return self[..<endIndex] }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM