简体繁体 English

NLP新手，关于注释的问题

[英]New to NLP, Question about annotation

原文 2010-11-30 03:35:59 4 3 java/ annotations/ nlp

I am new to NLP and I am looking for a starting point, in terms of some tutorials, documentation or example code. 我是NLP的新手，我正在寻找一个起点，就一些教程，文档或示例代码而言。 I have been told to research the possibilities of processing natural text to extract some structured data from it. 我被告知要研究处理自然文本以从中提取一些结构化数据的可能性。 For example I want to extract(annotate) height and weight from following statements. 例如，我想从以下语句中提取（注释）高度和重量。 "He is 6 feet tall and weighs 200 pounds" or "His height is 6 feet and weight is 200" etc. I have looked into UIMA but it seems like a self created REGEX dictionary with no training capabilities. “他身高6英尺，体重200磅”或“身高6英尺，体重200”等。我调查了UIMA，但它似乎是一个自创的REGEX词典，没有训练能力。 So in a nutshell, what Java framework can I use to create an annotation engine that can be trained as well! 简而言之，我可以使用什么Java框架来创建可以训练的注释引擎！ Any help(pointers) on this will be heavily appreciated. 任何帮助（指针）将非常感谢。 Thanks 谢谢

3 个解决方案

Since you asked for pointers: LingPipe (already mentioned above), OpenNLP , and Stanford NLP distributions . 因为您要求提供指针： LingPipe （已在上面提到）， OpenNLP和Stanford NLP发行版。

Note: if Python is an option, you can use the Natural Language Toolkit . 注意：如果Python是一个选项，您可以使用Natural Language Toolkit 。

If you really want to want to use machine learning to train your annotator, then GATE is probably your best bet. 如果你真的想用机器学习训练你的注释器，那么GATE可能是你最好的选择。 Take a look at the chapter on machine learning in their guide. 在他们的指南中查看有关机器学习的章节。