简体   繁体   English

NSLinguisticTagger:根据标签类型过滤出指定的令牌

[英]NSLinguisticTagger: Filter out Specified Token Depending on Tag Type

I am trying to filter out specific tokens based on their tags. 我正在尝试根据标签来过滤掉特定的令牌。 When I run my code I get this as the output. 运行代码时,将其作为输出。 I want to only retrieve the adjectives and have that outputted. 我只想检索形容词并将其输出。 Is there an easy way to do this? 是否有捷径可寻?

Hello: NSLinguisticTag(_rawValue: Interjection)
World: NSLinguisticTag(_rawValue: Noun)
this: NSLinguisticTag(_rawValue: Determiner)
is: NSLinguisticTag(_rawValue: Verb)
my: NSLinguisticTag(_rawValue: Determiner)
main: NSLinguisticTag(_rawValue: Adjective)
goal: NSLinguisticTag(_rawValue: Noun)

tokenizeText(inputtedText: "Hello World this is my main goal, to take these words and figure out the adjectives, verbs and nouns") tokenizeText(inputtedText:“您好,这是我的主要目标,使用这些单词并找出形容词,动词和名词”)

You can simply check if a tag is of type .adjective in the enumerateTags closure and only continue if it is: 您可以在enumerateTags闭包中简单地检查tag是否为.adjective类型,并且仅在以下情况下才继续:

let sentence = "The yellow cat hunts the little gray mouse around the block"
let options: NSLinguisticTagger.Options = [.omitWhitespace, .omitPunctuation, .joinNames]
let schemes = NSLinguisticTagger.availableTagSchemes(forLanguage: "en")
let tagger = NSLinguisticTagger(tagSchemes: schemes, options: Int(options.rawValue))
tagger.string = sentence
tagger.enumerateTags(in: NSRange(location: 0, length: sentence.count), scheme: .nameTypeOrLexicalClass, options: options) { (tag, tokenRange, _, _) in
    guard tag == .adjective, let adjectiveRange = Range(tokenRange, in: sentence) else { return }
    let adjectiveToken = sentence[adjectiveRange]
    print(adjectiveToken)
}

This prints out: 打印输出:

yellow 黄色
little
gray 灰色

EDIT 编辑

If you want the tokens of more than one tag type you could store the tokens in a dictionary with the tags as keys: 如果要使用多个标签类型的标记,可以将标记存储在字典中,这些标记作为键:

let sentence = "The yellow cat hunts the little gray mouse around the block"
let options: NSLinguisticTagger.Options = [.omitWhitespace, .omitPunctuation, .joinNames]
let schemes = NSLinguisticTagger.availableTagSchemes(forLanguage: "en")
let tagger = NSLinguisticTagger(tagSchemes: schemes, options: Int(options.rawValue))
tagger.string = sentence
var tokens: [NSLinguisticTag: [String]] = [:]
tagger.enumerateTags(in: NSRange(location: 0, length: sentence.count), scheme: .nameTypeOrLexicalClass, options: options) { (tag, tokenRange, _, _) in
    guard let tag = tag, let range = Range(tokenRange, in: sentence) else { return }
    let token = String(sentence[range])
    if tokens[tag] != nil {
        tokens[tag]!.append(token)
    } else {
        tokens[tag] = [token]
    }
}
print(tokens[.adjective])
print(tokens[.noun])

Which prints out: 打印出:

Optional(["yellow", "little", "gray"]) 可选([[黄色],“小”,“灰色”])
Optional(["cat", "mouse", "block"]) 可选([[“ cat”,“ mouse”,“ block”])

EDIT#2 编辑#2

If you want to be able to remove certain tags from a text you could write an extension like this: 如果您希望能够从文本中删除某些标签,则可以编写如下扩展名:

extension NSLinguisticTagger {
    func eliminate(unwantedTags: [NSLinguisticTag], from text: String, options: NSLinguisticTagger.Options) -> String {
        string = text
        var textWithoutUnwantedTags = ""
        enumerateTags(in: NSRange(location: 0, length: text.utf16.count), scheme: .nameTypeOrLexicalClass, options: options) { (tag, tokenRange, _, _) in
            guard
                let tag = tag,
                !unwantedTags.contains(tag),
                let range = Range(tokenRange, in: text)
                else { return }
            let token = String(text[range])
            textWithoutUnwantedTags += " \(token)"
        }

        return textWithoutUnwantedTags.trimmingCharacters(in: .whitespaces)
    }
}

Then you can use it like this: 然后,您可以像这样使用它:

let sentence = "The yellow cat hunts the little gray mouse around the block"
let options: NSLinguisticTagger.Options = [.omitWhitespace, .omitPunctuation, .joinNames]
let schemes = NSLinguisticTagger.availableTagSchemes(forLanguage: "en")
let tagger = NSLinguisticTagger(tagSchemes: schemes, options: Int(options.rawValue))

let sentenceWithoutAdjectives = tagger.eliminate(unwantedTags: [.adjective], from: sentence, options: options)
print(sentenceWithoutAdjectives)

Which prints out: 打印出:

The cat hunts the mouse around the block 猫在方块周围搜寻鼠标

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM