简体   繁体   中英

Agreement feature extraction from a text

I'm going through a task where i have to extract the agreement feature of the nouns in the text... The agreement feature such as:

number = singular, plural
person = first, second, third
gender = male, female, neuter
animacy = animate, inanimate

is there anyway to extract these features from the text ....

If your data is English, as your comments suggest, then the nouns will never have person information, so we can discount that.

Number is easy, as has been mentioned by others: many part-of-speech taggers differentiate between singular and plural nouns.

Gender and animacy are more interesting. In English, these are semantic rather than syntactic properties of nouns. For example, take the sentence The princess is in the tower . We know that princess is feminine and animate not because of inflectional information but because we know the word's meaning. It's feasible to build up an ontology by getting a big old corpus of data and analysing the pronouns and anaphors in it. Your algorithm would look for examples like these:

The princess looks at herself in the mirror.

The princess is in the tower. She is sad.

It would work out (somehow) that princess is the antecedent of herself and her , and infer the properties of the noun from the known properties of the pronouns. Of course, now the problem becomes reference resolution, which isn't trivial. Here are some references from a recent Edinburgh University lecture course on the subject:

  • Denis, Pascal and Baldridge, Jason, 2008. 'Specialized Models and Reranking for Coreference Resolution.' In Proceedings of the Conference on Empirical Methods in Natural Language Processing . ACL, 650-69.
  • Haghighi, Aria and Klein, Dan, 2010. 'Coreference Resolution in a Modular, Entity-Centred Model.' In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics . Los Angeles CA, 385-93.
  • Lappin, Shalom and Leass, Herbert, 1994. 'An Algorithm for Pronominal Anaphora Resolution.' Computational Linguistics 20:535-61.
  • Ng, Vincent, 2010. 'Supervised Noun Phrase Coreference Research: The first 15 years.' In ACL '10: Proceedings of the 48th Meeting of the Association for Computational Linguistics. 1396-411.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM