Python 输入包含 NaN、无穷大或对于 dtype float32 来说太大的值

Question

I'm trying to classify using sklearn's decision tree classifier.我正在尝试使用 sklearn 的决策树分类器进行分类。 I've stored my training and testing datasets into two seperate pandas dataframes.我已将训练和测试数据集存储到两个单独的 Pandas 数据帧中。 I'm calling the classifier like so:我像这样调用分类器：

classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(features_in_training_set, class_labels_in_training_set)
predictions = classifier.predict(features_in_testing_set)

However, I'm receiving this error, which seems to be common when classifying with the tree.但是，我收到此错误，这在使用树进行分类时似乎很常见。

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

I know that there are no missing values in either dataset.我知道这两个数据集中都没有缺失值。 I am changing them using the imputer method.我正在使用 imputer 方法更改它们。 The print out of my dataframes shows this but to double check, I've also tried df.isna() and the outputs are all False.我的数据帧的打印显示了这一点，但要仔细检查，我也试过df.isna()并且输出都是假的。 I don't think I have infinity values as the frames consist of binary values.我认为我没有无穷大值，因为帧由二进制值组成。 I don't want to remove rows or the columns as I don't want to reduce my dataset.我不想删除行或列，因为我不想减少我的数据集。 I also don't want to replace them on any other criteria.我也不想根据任何其他标准替换它们。

I'm not quite sure how to find which columns are too large for dtype float 32 and how to change them if they are.我不太确定如何找到对于 dtype float 32 来说哪些列太大，以及如何更改它们。 I have a feeling that it could be my timestamp column.我有一种感觉，它可能是我的时间戳列。 Here's a snippet of the training data frame as it's quite large:这是训练数据框的一个片段，因为它非常大：

             time      A     B 
0       1.518999e+09   1     1
1       1.518999e+09   1     0 
2       1.518999e+09   0     1
3       1.518999e+09   0     0

Answer 1

Try to check the summary of the dataframe.尝试检查数据框的摘要。 for eg if your data frame shows missing samples for any features, it means there are null values for some of the observations.例如，如果您的数据框显示任何特征的缺失样本，则意味着某些观察值存在空值。

a possible divide by null scenario is being encountered.正在遇到可能被空除的情况。

Python 输入包含 NaN、无穷大或对于 dtype float32 来说太大的值

问题描述

1 个解决方案

解决方案1
0 2020-02-20 05:18:49

Python 输入包含 NaN、无穷大或对于 dtype float32 来说太大的值

问题描述

1 个解决方案

解决方案1 0 2020-02-20 05:18:49

解决方案1
0 2020-02-20 05:18:49