简体   繁体   English

ValueError:输入包含NaN,无穷大或对于dtype('float32')而言太大的值。 为什么?

[英]ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). Why?

I have gone through all the similar questions but none of them answer my query. 我经历了所有类似的问题,但没有一个回答我的查询。 I am using random forest classifier as follows: 我正在使用随机森林分类器,如下所示:

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
clf.fit(X_train, y_train)
clf.predict(X_test)

It's giving me this error: 这给了我这个错误:

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

However, when I do X_train.describe() I don't see any missing values. 但是,当我执行X_train.describe()我看不到任何缺失的值。 In fact, actually, I already took care of the missing values before even splitting my data. 实际上,实际上,在拆分数据之前,我已经照顾了缺失的值。

When I do the following: 当我执行以下操作时:

np.where(X_train.values >= np.finfo(np.float32).max)

I get: 我得到:

(array([], dtype=int64), array([], dtype=int64))

And for these commands: 对于这些命令:

np.any(np.isnan(X_train)) #true
np.all(np.isfinite(X_train)) #false

And after getting the above results, I also tried this: 在获得以上结果之后,我还尝试了以下方法:

X_train.fillna(X_train.mean())

but I get the same error and it doesn't fix anything. 但是我遇到了同样的错误,它不能解决任何问题。

Please tell me where I'm going wrong. 请告诉我我要去哪里了。 Thank you! 谢谢!

Solution
X_train = X_train.fillna(X_train.mean())

Explanation 说明
np.any(np.isnan(X_train)) evals to True , therefore X_train contains some nan values. np.any(np.isnan(X_train))等效为True ,因此X_train包含一些nan值。 Per pandas fillna() docs , DataFrame.fillna() returns a copy of the DataFrame with missing values filled. 对于每个熊猫fillna()docs ,DataFrame.fillna()返回填充了缺失值的DataFrame副本。 You must reassign X_train to the return value of fillna(), like X_train = X_train.fillna(X_train.mean()) 您必须将X_train重新分配给fillna()的返回值,例如X_train = X_train.fillna(X_train.mean())

Example

>>> import pandas as pd
>>> import numpy as np
>>> 
>>> a = pd.DataFrame(np.arange(25).reshape(5, 5))
>>> a[2][2] = np.nan
>>> 
>>> a
    0   1     2   3   4
0   0   1   2.0   3   4
1   5   6   7.0   8   9
2  10  11   NaN  13  14
3  15  16  17.0  18  19
4  20  21  22.0  23  24
>>> 
>>> a.fillna(1)
    0   1     2   3   4
0   0   1   2.0   3   4
1   5   6   7.0   8   9
2  10  11   1.0  13  14
3  15  16  17.0  18  19
4  20  21  22.0  23  24
>>> 
>>> a
    0   1     2   3   4
0   0   1   2.0   3   4
1   5   6   7.0   8   9
2  10  11   NaN  13  14
3  15  16  17.0  18  19
4  20  21  22.0  23  24
>>> 
>>> a = a.fillna(1)
>>> a
    0   1     2   3   4
0   0   1   2.0   3   4
1   5   6   7.0   8   9
2  10  11   1.0  13  14
3  15  16  17.0  18  19
4  20  21  22.0  23  24
>>>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 sklearn错误ValueError:输入包含NaN,无穷大或对于dtype('float32')而言太大的值 - sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float32') 随机森林分类器 ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值 - Random Forest Classifier ValueError: Input contains NaN, infinity or a value too large for dtype('float32') 如何解决:ValueError: Input contains NaN, infinity or a value too large for dtype('float32')? - How to resolve: ValueError: Input contains NaN, infinity or a value too large for dtype('float32')? ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。 随机森林运行 - ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). randomforest run Sklearn ValueError:输入包含 NaN、无穷大或对于 dtype('float32')来说太大的值 - Sklearn ValueError: Input contains NaN, infinity or a value too large for dtype('float32') Scikit-learn - ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 和随机森林来说太大的值 - Scikit-learn - ValueError: Input contains NaN, infinity or a value too large for dtype('float32') with Random Forest Python 输入包含 NaN、无穷大或对于 dtype float32 来说太大的值 - Python input contains NaN, infinity or a value too large for dtype float32 输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值 - Input contains NaN, infinity or a value too large for dtype('float32') 输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。 毕索克 - Input contains NaN, infinity or a value too large for dtype('float32'). Pythorch RandomForestRegressor:输入包含 NaN、无穷大或对于 dtype('float32') 在 kaggle learn 上太大的值 - RandomForestRegressor: Input contains NaN, infinity or a value too large for dtype('float32') on kaggle learn
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM