简体   繁体   English

从文本中提取协议特征

[英]Agreement feature extraction from a text

I'm going through a task where i have to extract the agreement feature of the nouns in the text... The agreement feature such as: 我正在完成一项任务,我必须在文本中提取名词的协议功能......协议功能如:

number = singular, plural
person = first, second, third
gender = male, female, neuter
animacy = animate, inanimate

is there anyway to extract these features from the text .... 无论如何从文本中提取这些功能....

If your data is English, as your comments suggest, then the nouns will never have person information, so we can discount that. 如果您的数据是英语,正如您的评论所示,那么名词将永远不会有人信息,因此我们可以打折。

Number is easy, as has been mentioned by others: many part-of-speech taggers differentiate between singular and plural nouns. 正如其他人所提到的那样,数字很容易:许多词性标注符区分单数和复数名词。

Gender and animacy are more interesting. 性别和动画更有趣。 In English, these are semantic rather than syntactic properties of nouns. 在英语中,这些是名词的语义属性而不是句法属性。 For example, take the sentence The princess is in the tower . 例如,采取句子公主在塔中 We know that princess is feminine and animate not because of inflectional information but because we know the word's meaning. 我们知道, 公主是女性化和动画,不是因为屈折信息,而是因为我们知道这个词的含义。 It's feasible to build up an ontology by getting a big old corpus of data and analysing the pronouns and anaphors in it. 建立一个本体论是可行的,通过获取一个大的旧数据库并分析其中的代词和代词。 Your algorithm would look for examples like these: 您的算法会查找以下示例:

The princess looks at herself in the mirror. 公主在镜子里看着自己。

The princess is in the tower. 公主在塔里。 She is sad. 她很伤心。

It would work out (somehow) that princess is the antecedent of herself and her , and infer the properties of the noun from the known properties of the pronouns. 它会以某种方式证明公主的先行者,并从代词的已知属性推断出名词的属性。 Of course, now the problem becomes reference resolution, which isn't trivial. 当然,现在问题变成了参考分辨率,这不是微不足道的。 Here are some references from a recent Edinburgh University lecture course on the subject: 以下是最近爱丁堡大学关于该主题的讲座课程的一些参考资料:

  • Denis, Pascal and Baldridge, Jason, 2008. 'Specialized Models and Reranking for Coreference Resolution.' Denis,Pascal和Baldridge,Jason,2008年。“专业模型和重新分配共识”。 In Proceedings of the Conference on Empirical Methods in Natural Language Processing . 自然语言处理经验方法会议论文集中 ACL, 650-69. ACL,650-69。
  • Haghighi, Aria and Klein, Dan, 2010. 'Coreference Resolution in a Modular, Entity-Centred Model.' Haghighi,Ari​​a和Klein,Dan,2010年。“以模块化,以实体为中心的模型中的共识解决方案。” In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics . 人类语言技术:2010年计算语言学协会北美分会年会 Los Angeles CA, 385-93. 洛杉矶加州,385-93。
  • Lappin, Shalom and Leass, Herbert, 1994. 'An Algorithm for Pronominal Anaphora Resolution.' Lappin,Shalom和Leass,Herbert,1994。“一种代词回指解析算法”。 Computational Linguistics 20:535-61. 计算语言学 20:535-61。
  • Ng, Vincent, 2010. 'Supervised Noun Phrase Coreference Research: The first 15 years.' Ng,Vincent,2010年。“监督名词短语共同参与研究:前15年。” In ACL '10: Proceedings of the 48th Meeting of the Association for Computational Linguistics. ACL '10:计算语言学协会第48次会议记录。 1396-411. 1396-411。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM