简体   繁体   English

如何在我的C#应用​​程序中使用USE SharpNlp

[英]How to make use of USE SharpNlp in my C# application

I require POS tagging for my files in the corpus. 我需要在语料库中为我的文件进行POS标记。 I have successfully followed the installation instructions of SharpNlp 我已成功遵循SharpNlp的安装说明
I am using the binary version 我正在使用二进制版本

I created a new c# project in:       E:\sharp\sharpapp
location of Models Folder is:        E:\sharp\sharpapp\bin\Models
location of my SharpNlp Binary is:   E:\sharp\SharpNLP-1.0.2529-Bin

I have also followed the instructions to modify both .config files "ParseTree.Exe" and "ToolsExamples.Exe" 我也按照说明修改.config文件“ParseTree.Exe”和“ToolsExamples.Exe”

Now in my c# project I have a class called tagging.cs where I have to access my corpus text files and do POS tagging for those files. 现在在我的c#项目中,我有一个名为tagging.cs的类,我必须访问我的语料库文本文件并对这些文件进行POS标记。 Can anybody help me how can I make use of SharpNlp to do so 任何人都可以帮助我如何使用SharpNlp这样做

Please provide steps to do so. 请提供相应的步骤。

In a nutshell, SharpNLP is 简而言之, SharpNLP

  • a port to C# of OpenNLP Tools and OpenNLP MaxEnt OpenNLP ToolsOpenNLP MaxEnt的 C#端口
  • a connector to WordNet WordNet的连接器
  • a set of pre-computed models, mostly for the English language 一组预先计算的模型,主要用于英语
  • utility modules such as integration with SQLLite 实用程序模块,例如与SQLLite集成

It should be noted that the port of the OpenNLP libraries is relatively informal, with various class and property name changes, possibly loose preservation of features and semantics and no apparent connection with the original Java projects' lifecycle. 应该注意的是,OpenNLP库的端口是相对非正式的,具有各种类和属性名称更改,可能是对特性和语义的松散保留,并且与原始Java项目的生命周期没有明显的联系。 This situation will likely ensure that in time the OpenNLP portion of SharpNLP will be more akin to distant cousins than twin sisters... 这种情况可能会确保SharpNLP的OpenNLP部分将比孪生姐妹更类似于远亲堂兄弟......

Never the less, it is possible to use examples and documentation from OpenNLP to complement the relatively thin support material available with SharpNLP . 尽管如此, 可以使用OpenNLP的示例和文档来补充SharpNLP提供的相对较薄的支持材料 Between the source code of SharpNLP and resources like the OpenNLP API reference and the OpenNLP wiki , one can generally map things and adapt accordingly. 在SharpNLP的源代码和OpenNLP API参考OpenNLP wiki之类的资源之间,人们通常可以映射事物并相应地进行调整。

A loose conductor could be the study of this particular source file which makes use of OpenNLP in a way that seems close to what you may need. 松散的指挥可能是对这个特定源文件的研究,它以一种看似接近你可能需要的方式使用OpenNLP。 Note the name changes between OpenNLP and SharpNLP, for example POSTTaggerME class becomes MaximumEntropyPosTagger and the Parse() method and its overload turn to TagSentence() and such. 注意OpenNLP和SharpNLP之间的名称更改,例如POSTTaggerME类变为MaximumEntropyPosTaggerParse()方法及其重载变为TagSentence()等。

A more general hint is to understand... 更一般的提示是要理解......
... the sequence of steps typically necessary to perform POS Tagging . ... 执行POS标记通常所需的步骤序列
This is a very high-level approximate description but, I think, useful. 这是一个非常高级的近似描述,但我认为是有用的。

  • get the text to be tagged = string(s) of text 获取要标记的文本=文本的字符串
  • Initialize a text parser 初始化文本解析器
  • parse it = an "array" (or other container) with individual tokens ie words and punctuation characters. 解析它=一个带有单个标记的“数组”(或其他容器),即单词和标点字符。
  • initialize the POS Tagger, in particular tell its which model it should use 初始化POS Tagger,特别是告诉它应该使用哪个型号
  • feed the [ordered] sequence of tokens to the POS Tagger 将[有序]令牌序列提供给POS Tagger
  • Ta dah! 塔达! Use the POS tags for the eventual purpose of your NLP application. 将POS标签用于NLP应用程序的最终目的。

Note how the above sequence assumes that the model is readily available. 请注意上述序列假定模型随时可用。
The model is a representation of the statistical "profile" of text in general, obtained from training the Tagger with a set of text readily tagged. 该模型是一般文本的统计“概况”的表示,通过使用一组易于标记的文本训练Tagger获得。
SharpNLP comes with a model for generic English language, but in order to tag other languages or if the specific corpora to be tagged belongs to a particular domain (say medical reports or Tweets or...) it may be preferable to re-train the tagger to improve its precision. SharpNLP附带通用英语语言模型,但为了标记其他语言或者标记的特定语料库属于特定领域(例如医学报告或推文或......),最好重新训练标记器以提高其精度。
Open/SharpNLP as most POS Taggers, whether stand-alone or their API, typically include features to train them (= to produce a model given a sample set of text readily tagged) and also to verify the quality of the model/tagger so produced (= to compare the tags produced on a test set, with the tags expected for this set). Open / SharpNLP作为大多数POS标签器,无论是独立的还是它们的API,通常都包含训练它们的功能(=生成一个模型,给出一组容易标记的文本样本),并验证所生成的模型/标记器的质量(=比较测试集上生成的标签与此集合预期的标签)。

Kindly read through the article that I have written for this. 请仔细阅读我为此撰写的文章。 It will give you a detailed step by step method with sample code snippets. 它将为您提供详细的分步方法,其中包含示例代码段。

Easy way of Integrating SharpNLP into your project in Visual Studio 在Visual Studio中将SharpNLP集成到项目中的简便方法

I hope this was useful. 我希望这很有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM