简体繁体中英

What if a categorical column has multiple values in the train set but only one in test data? Would such a feature be useful in model training at all?

原文 2018-05-17 07:15:26 4 1 python/ machine-learning/ regression/ data-science/ feature-selection

I am trying to solve a regression problem, where in one of my features can take up two values ('1','0') in the train set but can be valued only '1' in the test data. Intuitively, including this feature seems wrong to me but I am unable to find a concrete logic to support my assumption.

1 answers

well, It depends on how many features you have in total. If very few (say less than five), that single feature will most likely play an important role in your classification. In this case, I would say you have "Data Mismatch" problem; meaning that your training and test data are coming from different distributions. One simple way to solve it is to put the two sets together, shuffle the whole set, and split your data again.

How to encode a feature which has a list of categorical values in each row for training an machine learning model?

cat boost Feature has 'Categorical type in training data but 'Float' type in test dataset

Can encode categorical data in train set but not in the test set

Sklearn train_test_split; retaining unique values from column(s) in training set

target encoding train and test data set with many categorical columns

how to train and test model with data one by one?

Why does training one model in my script train all others?

Using a list of values from train_test_split() as training data

Pandas groupby one column and keep only groups where column has all values in a set

how to load multiple training and validity data to train and validate a keras model

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to encode a feature which has a list of categorical values in each row for training an machine learning model? cat boost Feature has 'Categorical type in training data but 'Float' type in test dataset Can encode categorical data in train set but not in the test set Sklearn train_test_split; retaining unique values from column(s) in training set target encoding train and test data set with many categorical columns how to train and test model with data one by one? Why does training one model in my script train all others? Using a list of values from train_test_split() as training data Pandas groupby one column and keep only groups where column has all values in a set how to load multiple training and validity data to train and validate a keras model

Related Tags

What if a categorical column has multiple values in the train set but only one in test data? Would such a feature be useful in model training at all?

Question

1 answers

solution1 0 2018-05-17 07:40:14

solution1
0 2018-05-17 07:40:14