自动功能选择-Sklearn.feature_selection

Question

I have two datasets a train and test data. 我有两个数据集：训练数据和测试数据。 train.shape = (307511, 122) and test.shape = (48744, 121). train.shape =（307511，122）和test.shape =（48744，121）。 both these data sets contain these dtype: int32, float64 and object. 这两个数据集都包含以下dtype：int32，float64和object。

I did hot encoding to convert objects to either float or int dtype. 我进行了热编码，将对象转换为float或int dtype。

train = pd.get_dummies(train)
test = pd.get_dummies(test)
print('Train dummies shape: {}'.format(train.shape))
print('Test dummies shape: {}'.format(test.shape))

I got these results from the code above: 我从上面的代码中得到了这些结果：

Train dummies shape: (307511, 246)
Test dummies shape: (48744, 242)

The shape has changed thus HotEncoding has succeeded. 形状已更改，因此HotEncoding成功。 But now the problem I am facing is that When I try to train and test my data i get this error: 但是现在我面临的问题是，当我尝试训练和测试数据时，出现此错误：

ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

These are my imports: 这些是我的进口：

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel 
from sklearn.ensemble import ExtraTreesClassifier

Please help 请帮忙

Answer 1

Try this: 尝试这个：

train.as_matrix().astype(np.float)
test.as_matrix().astype(np.float)

自动功能选择-Sklearn.feature_selection

问题描述

1 个解决方案

解决方案1
2 2018-09-04 14:07:47

自动功能选择-Sklearn.feature_selection

问题描述

1 个解决方案

解决方案1 2 2018-09-04 14:07:47

解决方案1
2 2018-09-04 14:07:47