简体   繁体   English

如何在 pycrfsuite.Tagger() 中使用边际、概率方法

[英]How to use marginal, probability method in pycrfsuite.Tagger()

Documentation is not helpful to me at all.文档对我一点帮助都没有。

在此处输入图片说明

First, I tried using set() ,but I don't understand what it means by首先,我尝试使用set() ,但我不明白它的含义

set an instance for future calls为将来的调用设置一个实例

I could successfully feed my data using my dataset's structure described below.我可以使用下面描述的我的数据集结构成功地提供我的数据。 So, I am not sure why I need to use set for that as it mentioned.所以,我不确定为什么我需要像它提到​​的那样使用 set 。

Here is my feature sequence of type scipy.sparse after I called nonzero() method.这是我调用scipy.sparse nonzero()方法后的scipy.sparse类型的特征序列。

[['66=1', '240=1', '286=1', '347=10', '348=1'],...] [['66=1', '240=1', '286=1', '347=10', '348=1'],...]

where ... imply, same structure as previous elements其中 ... 表示与前面的元素结构相同

Second problem I encountered is Tagger.probability() and Tagger.marginal().我遇到的第二个问题是 Tagger.probability() 和 Tagger.marginal()。

在此处输入图片说明

在此处输入图片说明

For Tagger.probability, I used the same input as Tagget.tag(), and I get this follwoing error.对于 Tagger.probability,我使用了与 Tagget.tag() 相同的输入,我得到了以下错误。

在此处输入图片说明

and if my input is just a list instead of list of list .如果我的输入只是一个list而不是list of list I get the following error.我收到以下错误。

Traceback (most recent call last):
  File "cliner", line 60, in <module>
    main()
  File "cliner", line 49, in main
    train.main()
  File "C:\Users\Anak\PycharmProjects\CliNER\code\train.py", line 157, in main
    train(training_list, args.model, args.format, args.use_lstm, logfile=args.log, val=val_list, test=test_list)
  File "C:\Users\Anak\PycharmProjects\CliNER\code\train.py", line 189, in train
    model.train(train_docs, val=val_docs, test=test_docs)
  File "C:\Users\Anak\PycharmProjects\CliNER\code\model.py", line 200, in train
    test_sents=test_sents, test_labels=test_labels)
  File "C:\Users\Anak\PycharmProjects\CliNER\code\model.py", line 231, in train_fit
    dev_split=dev_split     )
  File "C:\Users\Anak\PycharmProjects\CliNER\code\model.py", line 653, in generic_train
    test_X=test_X, test_Y=test_Y)
  File "C:\Users\Anak\PycharmProjects\CliNER\code\machine_learning\crf.py", line 220, in train
    train_pred = predict(model,     X) # ANAK
  File "C:\Users\Anak\PycharmProjects\CliNER\code\machine_learning\crf.py", line 291, in predict
    print(tagger.probability(xseq[0]))
  File "pycrfsuite/_pycrfsuite.pyx", line 650, in pycrfsuite._pycrfsuite.Tagger.probability
ValueError: The numbers of items and labels differ: |x| = 12, |y| = 73

For Tagger.marginal(), I can only produce error similar to first error shown of Tagger.probabilit().对于 Tagger.marginal(),我只能产生类似于 Tagger.probabilit() 显示的第一个错误的错误。

在此处输入图片说明

Any clue on how to use these 3 methods??关于如何使用这 3 种方法的任何线索? Please give me shorts example of use cases of these 3 methods.请给我这 3 种方法的用例的短裤示例。

I feel like there must be some example of these 3 methods, but I couldn't find one.我觉得这三种方法一定有一些例子,但我找不到一个。 Am I looking at the right place.我看对地方了吗。 This is the website I am reading documentation from这是我正在阅读文档的网站

Additional info: I am using CliNER.附加信息:我正在使用 CliNER。 in case any of you are familiar with it.以防万一你们熟悉它。

https://python-crfsuite.readthedocs.io/en/latest/pycrfsuite.html https://python-crfsuite.readthedocs.io/en/latest/pycrfsuite.html

I know this questions is over a year old, but I just had to figure out the same thing as well -- I am also leveraging some of the CliNER framework.我知道这个问题已经有一年多了,但我也必须弄清楚同样的事情——我也在利用一些 CliNER 框架。 For the CliNER specific solution, I forked the repo and rewrote the predict method in the ./code/machine_learning/crf.py file对于 CliNER 特定的解决方案,我分叉了 repo 并在./code/machine_learning/crf.py文件中重写了predict方法

To obtain the marginal probability, you need to add the following line to the for loop that iterates over the pycrf_instances after yseq is created (see line 196 here )要获得边际概率,您需要pycrf_instances yseq添加到在创建pycrf_instances后迭代pycrf_instances的 for 循环中(请参阅此处的第 196 行)

y_probs = [tagger.marginal(y, ii) for ii, y in enumerate(yseq)]

And then you can return that list of marginal probabilities from the predict method -- you will in turn be required to rewrite additional functions in the to accommodate this change.然后,您可以从 predict 方法返回该边际概率列表——反过来,您将需要重写 中的其他函数以适应这种变化。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Spacy NER 模型中获得每个实体的预测概率? - How to get probability of prediction per entity from Spacy NER model? Stanford NER Tagger和NLTK-不起作用[OSError:Java命令失败] - Stanford NER Tagger and NLTK - not working [OSError: Java command failed ] Python NLTK:Stanford NER标记器错误消息:NLTK无法找到Java文件 - Python NLTK: Stanford NER tagger error message: NLTK was unable to find the java file 如何将 ktrain 用于 NER 离线? - how to use ktrain for NER Offline? 如何在 Tensorflow 2 中使用 CRF 层(使用 tfa.text)? - How to use a CRF layer in Tensorflow 2 (using tfa.text)? 如何使用spacy对CSV文件进行名称实体识别 - How to use spacy to do Name Entity recognition on CSV file Stanford NLP,加载标记器模型时出错,而从路径中读取模型时出错 - Stanford NLP, Error while loading a tagger model, while reading models from path 我们如何使用 Spacy minibatch 和 GoldParse 来训练使用 BILUO 标记方案的 NER 模型? - How can we use Spacy minibatch and GoldParse to train NER model using BILUO tagging scheme? 我应该使用 NLP 来检测元数据中的实体吗? 如何? - Should I use NLP to detect entities in metadata? How? Spacy:令牌只能是一个实体的一部分,因此请确保您设置的实体不重叠..如何使用 filter_spans - Spacy : A token can only be part of one entity, so make sure the entities you're setting don't overlap.. how to make use of filter_spans
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM