简体   繁体   中英

How to use marginal, probability method in pycrfsuite.Tagger()

Documentation is not helpful to me at all.

在此处输入图片说明

First, I tried using set() ,but I don't understand what it means by

set an instance for future calls

I could successfully feed my data using my dataset's structure described below. So, I am not sure why I need to use set for that as it mentioned.

Here is my feature sequence of type scipy.sparse after I called nonzero() method.

[['66=1', '240=1', '286=1', '347=10', '348=1'],...]

where ... imply, same structure as previous elements

Second problem I encountered is Tagger.probability() and Tagger.marginal().

在此处输入图片说明

在此处输入图片说明

For Tagger.probability, I used the same input as Tagget.tag(), and I get this follwoing error.

在此处输入图片说明

and if my input is just a list instead of list of list . I get the following error.

Traceback (most recent call last):
  File "cliner", line 60, in <module>
    main()
  File "cliner", line 49, in main
    train.main()
  File "C:\Users\Anak\PycharmProjects\CliNER\code\train.py", line 157, in main
    train(training_list, args.model, args.format, args.use_lstm, logfile=args.log, val=val_list, test=test_list)
  File "C:\Users\Anak\PycharmProjects\CliNER\code\train.py", line 189, in train
    model.train(train_docs, val=val_docs, test=test_docs)
  File "C:\Users\Anak\PycharmProjects\CliNER\code\model.py", line 200, in train
    test_sents=test_sents, test_labels=test_labels)
  File "C:\Users\Anak\PycharmProjects\CliNER\code\model.py", line 231, in train_fit
    dev_split=dev_split     )
  File "C:\Users\Anak\PycharmProjects\CliNER\code\model.py", line 653, in generic_train
    test_X=test_X, test_Y=test_Y)
  File "C:\Users\Anak\PycharmProjects\CliNER\code\machine_learning\crf.py", line 220, in train
    train_pred = predict(model,     X) # ANAK
  File "C:\Users\Anak\PycharmProjects\CliNER\code\machine_learning\crf.py", line 291, in predict
    print(tagger.probability(xseq[0]))
  File "pycrfsuite/_pycrfsuite.pyx", line 650, in pycrfsuite._pycrfsuite.Tagger.probability
ValueError: The numbers of items and labels differ: |x| = 12, |y| = 73

For Tagger.marginal(), I can only produce error similar to first error shown of Tagger.probabilit().

在此处输入图片说明

Any clue on how to use these 3 methods?? Please give me shorts example of use cases of these 3 methods.

I feel like there must be some example of these 3 methods, but I couldn't find one. Am I looking at the right place. This is the website I am reading documentation from

Additional info: I am using CliNER. in case any of you are familiar with it.

https://python-crfsuite.readthedocs.io/en/latest/pycrfsuite.html

I know this questions is over a year old, but I just had to figure out the same thing as well -- I am also leveraging some of the CliNER framework. For the CliNER specific solution, I forked the repo and rewrote the predict method in the ./code/machine_learning/crf.py file

To obtain the marginal probability, you need to add the following line to the for loop that iterates over the pycrf_instances after yseq is created (see line 196 here )

y_probs = [tagger.marginal(y, ii) for ii, y in enumerate(yseq)]

And then you can return that list of marginal probabilities from the predict method -- you will in turn be required to rewrite additional functions in the to accommodate this change.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM