简体   繁体   English

NLP新手,关于注释的问题

[英]New to NLP, Question about annotation

I am new to NLP and I am looking for a starting point, in terms of some tutorials, documentation or example code. 我是NLP的新手,我正在寻找一个起点,就一些教程,文档或示例代码而言。 I have been told to research the possibilities of processing natural text to extract some structured data from it. 我被告知要研究处理自然文本以从中提取一些结构化数据的可能性。 For example I want to extract(annotate) height and weight from following statements. 例如,我想从以下语句中提取(注释)高度和重量。 "He is 6 feet tall and weighs 200 pounds" or "His height is 6 feet and weight is 200" etc. I have looked into UIMA but it seems like a self created REGEX dictionary with no training capabilities. “他身高6英尺,体重200磅”或“身高6英尺,体重200”等。我调查了UIMA,但它似乎是一个自创的REGEX词典,没有训练能力。 So in a nutshell, what Java framework can I use to create an annotation engine that can be trained as well! 简而言之,我可以使用什么Java框架来创建可以训练的注释引擎! Any help(pointers) on this will be heavily appreciated. 任何帮助(指针)将非常感谢。 Thanks 谢谢

Since you asked for pointers: LingPipe (already mentioned above), OpenNLP , and Stanford NLP distributions . 因为您要求提供指针: LingPipe (已在上面提到), OpenNLPStanford NLP发行版

Note: if Python is an option, you can use the Natural Language Toolkit . 注意:如果Python是一个选项,您可以使用Natural Language Toolkit

If you really want to want to use machine learning to train your annotator, then GATE is probably your best bet. 如果你真的想用机器学习训练你的注释器,那么GATE可能是你最好的选择。 Take a look at the chapter on machine learning in their guide. 在他们的指南中查看有关机器学习的章节。

I'd use NER. 我会用NER。 Here is the output I see for your input text: 这是我在输入文本中看到的输出: 在此输入图像描述

You can try it here: http://deagol.cs.illinois.edu:8080 你可以在这里试试: http//deagol.cs.illinois.edu8080

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM