简体   繁体   中英

Mallet CRF SimpleTagger Performance Tuning

A question for anyone who has used the Java library Mallet's SimpleTagger class for Conditional Random Fields (CRF). Assume that I'm already using the multi-thread option for the maximum number of CPUs I have available (this is the case): where would I start, and would kind of things should I try if I need it to run faster?

A related question is whether there is a way to do something similar to Stochastic Gradient Descent, which would speed up the training process?

The type of training I want to do is simple:

Input:
Feature1 ... FeatureN SequenceLabel
...

Test Data:
Feature1 ... FeatureN
...

Output:

Feature1 ... FeatureN SequenceLabel
...

(Where features are the output of processing I have done on the data in my own code.)

I've had problems getting any CRF classifier other than Mallet to approximately work, but I may have to backtrack again and revisit one of the other implementations, or try a new one.

Yes, stochastic gradient descent is usually way faster than the L-BFGS optimizer used in Mallet. I would suggest you try CRFSuite , which you can train either by SGD or L-BFGS. You could also give Léon Bottou's SGD-based implementation a try, but that is more difficult to setup.

Otherwise, I believe that CRF++ is the most used CRF software around. It is based on L-BFGS though, so it might not be fast enough for you.

Both CRFSuite and CRF++ should be easy to get started with.

Note that all of these will be slow if you have a large number of labels. At least CRFSuite can be configured to only take into account observed label-n-grams - in an (n-1)th order model - which will typically make training and prediction much faster.

Please have a look at this paper: http://www.stanford.edu/~acoates/papers/LeNgiCoaLahProNg11.pdf

It seems stochastic gradient descent methods are difficult to tune and parallelize.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM