简体   繁体   中英

Identifying person names with NSLinguisticTagger

I am tinkering with the NSLinguisticTagger .

Identifying basic word types like noun, verb, prepositions works really well.

However the recognition of person names NSLinguisticTagPersonalName hardly works in my tests (iOS8). Places NSLinguisticTagPlaceName also seem to work pretty well, however most of the times also person names are categorised as places.

Here's my basic setup (using NSLinguisticTagSchemeNameTypeOrLexicalClass)

    var tagger:NSLinguisticTagger = NSLinguisticTagger(tagSchemes: NSLinguisticTagger.availableTagSchemesForLanguage("en") , options: 3)
    tagger.string = entryString
    tagger.enumerateTagsInRange(NSMakeRange(0, entryString.length), scheme: NSLinguisticTagSchemeNameTypeOrLexicalClass, options: (NSLinguisticTaggerOptions.OmitWhitespace | NSLinguisticTaggerOptions.JoinNames), usingBlock: {
        tag,tokenRange,sentenceRange,_ in
        let token = entryString.substringWithRange(tokenRange)
        println("[\(tag)] \(token) \(tokenRange)")

Example 1

 "Meeting with John in Paris"

  Evaluation

 [Verb] Meeting
 [Preposition] with
 [Noun] John
 [Preposition] in
 [PlaceName] Paris

Example 2

 "Meeting with John"

  Evaluation

 [Verb] Meeting (0,7)
 [Preposition] with (8,4)
 [PlaceName] John (13,4)

Any idea how I could improve the matching for person names?

Also I'd be interested to know how a Name would need to appear to be recognized. (I assumed eg a preposition like "with" would be a good indicator … apparently this isn't enough). I'd appreciate any ideas or additional insights on this. It's an exciting field.

Apparently the correct answer is: "wait a few years for Apple to improve NSLinguisticTagger in Swift 4 "

Here is the Swift 4 code written and executed in Xcode 9 (beta):

let entryString = "Meeting with John"

let schemes = NSLinguisticTagger.availableTagSchemes(forLanguage: "en")
let options: NSLinguisticTagger.Options = [
    .omitWhitespace, .omitPunctuation, .joinNames
]

let tagger = NSLinguisticTagger(tagSchemes: schemes, options: Int(options.rawValue))
tagger.string = entryString

let rangeOfEntireEntryString = NSRange(location: 0, length: entryString.utf16.count)

tagger.enumerateTags(
    in: rangeOfEntireEntryString,
    scheme: .nameTypeOrLexicalClass,
    options: options)
{ (tag, tokenRange, sentenceRange, _) in
    guard let tag = tag?.rawValue else { return }
    let token = (entryString as NSString).substring(with: tokenRange)
    print("[\(tag)] \(token) \(tokenRange)")
}

and here are results with your first example string:

let entryString = "Meeting with John in Paris"

[Noun] Meeting {0, 7}
[Preposition] with {8, 4}
[PersonalName] John {13, 4}
[Preposition] in {18, 2}
[PlaceName] Paris {21, 5}

and your second example string:

let entryString = "Meeting with John"

[Noun] Meeting {0, 7}
[Preposition] with {8, 4}
[PersonalName] John {13, 4}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM