![](/img/trans.png)
[英]ValueError: Input contains NaN, infinity or a value too large for dtype('float64') using fit from KNeighborsRegressor
[英]Error in fit_transform: Input contains NaN, infinity or a value too large for dtype('float64')
我有一个形状为(14407,2564)的数据框。 我正在尝试使用VarianceThreshold函数删除低方差特征。 但是,当我调用fit_transform时,出现以下错误:
ValueError:输入包含NaN,无穷大或dtype('float64')太大的值。
在使用Uign VarianceThreshold之前,我使用以下代码替换了df中所有缺少的值:
df.replace('null',np.NaN, inplace=True)
df.replace(r'^\s*$', np.NaN, regex=True, inplace=True)
df.fillna(value=df.median(), inplace=True)
之后,我使用以下方法检查了数据框是否有任何空/无限值:
m = df.isnull().any()
print "========= COLUMNS WITH NULL VALUES ================="
print m[m]
print "========= COLUMNS WITH INFINITE VALUES ================="
m = np.isfinite(df.select_dtypes(include=['float64'])).any()
print m[m]
并且我得到一个空的Series作为输出,这意味着我所有的列都没有缺失值。 输出为:
========= COLUMNS WITH NULL VALUES =================
Series([], dtype: bool)
========= COLUMNS WITH INFINITE VALUES =================
Series([], dtype: bool)
完整的错误跟踪:
Traceback (most recent call last):
File "/home/users/MyUsername/MyProject/src/main/python/Main.py", line 222, in <module>
main()
File "/home/users/MyUsername/MyProject/src/main/python/Main.py", line 218, in main
getAllData()
File "/home/users/MyUsername/MyProject/src/main/python/Main.py", line 95, in getAllData
predictors, labels, dropped_features = fselector.process(variance=True, corr=True, bestf=True, bestfk=200)
File "/home/users/MyUsername/MyProject/src/main/python/classes/featureselector.py", line 54, in process
self.getVariance(threshold=(.95 * (1 - .95)))
File "/home/users/MyUsername/MyProject/src/main/python/classes/featureselector.py", line 136, in getVariance
self.removeLowVarianceColumns(df=self.X, thresh=threshold)
File "/home/users/MyUsername/MyProject/src/main/python/classes/featureselector.py", line 213, in removeLowVarianceColumns
selector.fit_transform(df)
File "/usr/lib64/python2.7/site-packages/sklearn/base.py", line 494, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/usr/lib64/python2.7/site-packages/sklearn/feature_selection/variance_threshold.py", line 64, in fit
X = check_array(X, ('csr', 'csc'), dtype=np.float64)
File "/usr/lib64/python2.7/site-packages/sklearn/utils/validation.py", line 407, in check_array
_assert_all_finite(array)
File "/usr/lib64/python2.7/site-packages/sklearn/utils/validation.py", line 58, in _assert_all_finite
" or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
因此,我不确定要检查什么,这似乎不是缺少值的问题,但是我也无法获取导致问题的列/值。
我在这里看到几个线程都以缺少值结尾,但这似乎不是问题所在。
我通过将数据转换为数字来解决此问题。 看起来,尽管错误消息显示为“ float64”,但我的数据仅是所有对象,而对象与fit_transform不能很好地配合使用。
使用df = df.apply(lambda x: pd.to_numeric(x,errors='ignore'))
将我的数据更改为浮动。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.