[英]NSLinguisticTagger: Filter out Specified Token Depending on Tag Type
我正在嘗試根據標簽來過濾掉特定的令牌。 運行代碼時,將其作為輸出。 我只想檢索形容詞並將其輸出。 是否有捷徑可尋?
Hello: NSLinguisticTag(_rawValue: Interjection)
World: NSLinguisticTag(_rawValue: Noun)
this: NSLinguisticTag(_rawValue: Determiner)
is: NSLinguisticTag(_rawValue: Verb)
my: NSLinguisticTag(_rawValue: Determiner)
main: NSLinguisticTag(_rawValue: Adjective)
goal: NSLinguisticTag(_rawValue: Noun)
tokenizeText(inputtedText:“您好,這是我的主要目標,使用這些單詞並找出形容詞,動詞和名詞”)
您可以在enumerateTags
閉包中簡單地檢查tag
是否為.adjective
類型,並且僅在以下情況下才繼續:
let sentence = "The yellow cat hunts the little gray mouse around the block"
let options: NSLinguisticTagger.Options = [.omitWhitespace, .omitPunctuation, .joinNames]
let schemes = NSLinguisticTagger.availableTagSchemes(forLanguage: "en")
let tagger = NSLinguisticTagger(tagSchemes: schemes, options: Int(options.rawValue))
tagger.string = sentence
tagger.enumerateTags(in: NSRange(location: 0, length: sentence.count), scheme: .nameTypeOrLexicalClass, options: options) { (tag, tokenRange, _, _) in
guard tag == .adjective, let adjectiveRange = Range(tokenRange, in: sentence) else { return }
let adjectiveToken = sentence[adjectiveRange]
print(adjectiveToken)
}
打印輸出:
黃色
小
灰色
編輯
如果要使用多個標簽類型的標記,可以將標記存儲在字典中,這些標記作為鍵:
let sentence = "The yellow cat hunts the little gray mouse around the block"
let options: NSLinguisticTagger.Options = [.omitWhitespace, .omitPunctuation, .joinNames]
let schemes = NSLinguisticTagger.availableTagSchemes(forLanguage: "en")
let tagger = NSLinguisticTagger(tagSchemes: schemes, options: Int(options.rawValue))
tagger.string = sentence
var tokens: [NSLinguisticTag: [String]] = [:]
tagger.enumerateTags(in: NSRange(location: 0, length: sentence.count), scheme: .nameTypeOrLexicalClass, options: options) { (tag, tokenRange, _, _) in
guard let tag = tag, let range = Range(tokenRange, in: sentence) else { return }
let token = String(sentence[range])
if tokens[tag] != nil {
tokens[tag]!.append(token)
} else {
tokens[tag] = [token]
}
}
print(tokens[.adjective])
print(tokens[.noun])
打印出:
可選([[黃色],“小”,“灰色”])
可選([[“ cat”,“ mouse”,“ block”])
編輯#2
如果您希望能夠從文本中刪除某些標簽,則可以編寫如下擴展名:
extension NSLinguisticTagger {
func eliminate(unwantedTags: [NSLinguisticTag], from text: String, options: NSLinguisticTagger.Options) -> String {
string = text
var textWithoutUnwantedTags = ""
enumerateTags(in: NSRange(location: 0, length: text.utf16.count), scheme: .nameTypeOrLexicalClass, options: options) { (tag, tokenRange, _, _) in
guard
let tag = tag,
!unwantedTags.contains(tag),
let range = Range(tokenRange, in: text)
else { return }
let token = String(text[range])
textWithoutUnwantedTags += " \(token)"
}
return textWithoutUnwantedTags.trimmingCharacters(in: .whitespaces)
}
}
然后,您可以像這樣使用它:
let sentence = "The yellow cat hunts the little gray mouse around the block"
let options: NSLinguisticTagger.Options = [.omitWhitespace, .omitPunctuation, .joinNames]
let schemes = NSLinguisticTagger.availableTagSchemes(forLanguage: "en")
let tagger = NSLinguisticTagger(tagSchemes: schemes, options: Int(options.rawValue))
let sentenceWithoutAdjectives = tagger.eliminate(unwantedTags: [.adjective], from: sentence, options: options)
print(sentenceWithoutAdjectives)
打印出:
貓在方塊周圍搜尋鼠標
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.