简体繁体中英

can I use numerical features in crf model

原文 2014-10-01 23:40:44 6 3 machine-learning/ nlp/ data-mining/ data-modeling/ crf

Is it possible/good to add numerical features in crf models? eg position in the sequence.

I'm using CRFsuite . It seems all the features will be converted to string, eg 'pos=0', 'pos=1', which then lose it's meaning as euclidean distance.

Or should I use them to train another model, eg svm, then ensemble with crf models?

3 answers

I figured out that CRFsuite does handle numerical features, at least according to this documentation :

{“string_key”: float_weight, ...} dict where keys are observed features and values are their weights;

{“string_key”: bool, ...} dict; True is converted to 1.0 weight, False - to 0.0;

{“string_key”: “string_value”, ...} dict; that's the same as {“string_key=string_value”: 1.0, ...}

[“string_key1”, “string_key2”, ...] list; that's the same as {“string_key1”: 1.0, “string_key2”: 1.0, ...}

{“string_prefix”: {...}} dicts: nested dict is processed and “string_prefix” s prepended to each key.

{“string_prefix”: [...]} dicts: nested list is processed and “string_prefix” s prepended to each key.

{“string_prefix”: set([...])} dicts: nested list is processed and “string_prefix” s prepended to each key.

As long as:

I keep the input properly formatted;
I use float vs string of float;
I normalize it.

CRF itself can use numerical features, and you should use them, but if your implementations converts them to strings (encodes in the binary form by the "one hot spot encoding") then it might be of reduced significance. I suggest to look for more "pure" CRF which allows continuous variables.

A fun fact is that CRF in its core is just structured MaxEnt (LogisticRegression) which works in continuous domain , this string encoding is actually a way to go from categorical values into continuous domain so your problem is actually a result of "overdesigning" of CRFSuite which forgot about actual capabilities of CRF model.

Just to clarify a bit the answer by Lishu (which is correct but might confuse other readers as it did to me until I tried it). This:

{“string_key”: float_weight, ...} dict where keys are observed features and values are their weights

could have been written as

{“feature_template_name”: feature_value, ...} dict where keys are feature names and values are their values

ie with this you're not setting the weight for the CRF corresponding to this feature_template, but the value of this feature. I prefer to refer to them feature templates that have feature values, so that everything is more clear than just "features". Then, the CRF will learn a weight associated to each of the possible feature_values for this feature_template

How can I use a machine learning model to predict on data whose features differ slightly?

How can I use numerical labels for regression in Caffe?

How to use unified pipelines on numerical and categorical features in machine learning?

One hot encoding categorical features to use as training data with numerical features in sklearn

if you have a numerical target of two classes 0 and 1 and all the features are numerical as well, should i encode the target?

How to convert genre column to numerical value so that I can feed it to the neural network model?

How to operate multidimensional features in SVM or use multidimensional features to train model?

Does an LSTM model use trend in features?

Can we use Logistic Regression to predict numerical(continuous) variable i.e Revenue of the Restaurant

How do I properly combine numerical features with text (bag of words) in Spark?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How can I use a machine learning model to predict on data whose features differ slightly? How can I use numerical labels for regression in Caffe? How to use unified pipelines on numerical and categorical features in machine learning? One hot encoding categorical features to use as training data with numerical features in sklearn if you have a numerical target of two classes 0 and 1 and all the features are numerical as well, should i encode the target? How to convert genre column to numerical value so that I can feed it to the neural network model? How to operate multidimensional features in SVM or use multidimensional features to train model? Does an LSTM model use trend in features? Can we use Logistic Regression to predict numerical(continuous) variable i.e Revenue of the Restaurant How do I properly combine numerical features with text (bag of words) in Spark?

Related Tags

can I use numerical features in crf model

Question

3 answers

solution1
8 ACCPTED 2014-10-13 21:40:49

solution2
4 2014-10-02 15:06:49

solution3
0 2019-09-03 22:11:05

can I use numerical features in crf model

Question

3 answers

solution1 8 ACCPTED 2014-10-13 21:40:49

solution2 4 2014-10-02 15:06:49

solution3 0 2019-09-03 22:11:05

solution1
8 ACCPTED 2014-10-13 21:40:49

solution2
4 2014-10-02 15:06:49

solution3
0 2019-09-03 22:11:05