简体   繁体   中英

how can I use entity class of previous token as a feature for NER while using crfsuite

I am using python-crfsuite package in python, an implementation of CRFSUITE developed by Naoaki Okazaki( http://www.chokkan.org/software/crfsuite/ )

I want to use the entity class of previous token as a feature, which will help me in identifying multi-word named entities. my training data example:

[(Raheja,B-builder),(vista,I-builder),(is,O),(very,O),(famous,O)]

here if i can use the previous class feature while training.but while predicting we pass the list of features to the tagger object. the problem while testing is that previous class will not be known.

can anyone tell me if this is possible in python-crfsuite at all. I feel that the way we pass features to the tagger object, it is not possible.

I believe this is not possible with crfsuite (and python-crfsuite), based on this sentence in the tutorial :

Features conditioned with attributes and label bigrams are not supported.

Class of the previous token is used as a feature by default in CRFSuite. CRFSuite uses two kinds of features:

  1. "state features" - I(current_label=A and f(sequence, current_position)) ;
  2. "transition features" - I(current_label=A and previous_label=B)

Features you define are in fact f functions in (1); state features are generated for all possible values of the label. To use transition features you don't have to do anything, they are generated by default.

What is not implemented in CRFsuite is a third kind of feature: I(current_label=A and previous_label=B and f(sequence, current_position)) ; this is what tutorial means by "Features conditioned with attributes and label bigrams".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM