简体   繁体   English

fit_transform中的错误:输入包含NaN,无穷大或对于dtype('float64')而言太大的值

[英]Error in fit_transform: Input contains NaN, infinity or a value too large for dtype('float64')

I have a dataframe of shape (14407, 2564). 我有一个形状为(14407,2564)的数据框。 I am trying to remove low variance features using the VarianceThreshold function. 我正在尝试使用VarianceThreshold函数删除低方差特征。 However, when I call fit_transform, I get the following error: 但是,当我调用fit_transform时,出现以下错误:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). ValueError:输入包含NaN,无穷大或dtype('float64')太大的值。

Before usign VarianceThreshold, I replaces all the missing value from my df using the below code: 在使用Uign VarianceThreshold之前,我使用以下代码替换了df中所有缺少的值:

    df.replace('null',np.NaN, inplace=True)
    df.replace(r'^\s*$', np.NaN, regex=True, inplace=True)
    df.fillna(value=df.median(), inplace=True)

I checked my dataframe afterwards for any empty/infinite values using: 之后,我使用以下方法检查了数据框是否有任何空/无限值:

    m = df.isnull().any()
    print "========= COLUMNS WITH NULL VALUES ================="
    print m[m]
    print "========= COLUMNS WITH INFINITE VALUES ================="
    m = np.isfinite(df.select_dtypes(include=['float64'])).any()
    print m[m]

and I got an empty Series as an output, which means all my columns do not have any missing values. 并且我得到一个空的Series作为输出,这意味着我所有的列都没有缺失值。 The output was: 输出为:

    ========= COLUMNS WITH NULL VALUES =================
    Series([], dtype: bool)
    ========= COLUMNS WITH INFINITE VALUES =================
    Series([], dtype: bool)

Full error trace: 完整的错误跟踪:

    Traceback (most recent call last):
      File "/home/users/MyUsername/MyProject/src/main/python/Main.py", line 222, in <module>
        main()
      File "/home/users/MyUsername/MyProject/src/main/python/Main.py", line 218, in         main
        getAllData()
      File "/home/users/MyUsername/MyProject/src/main/python/Main.py", line 95, in getAllData
        predictors, labels, dropped_features = fselector.process(variance=True, corr=True, bestf=True, bestfk=200)
      File         "/home/users/MyUsername/MyProject/src/main/python/classes/featureselector.py", line 54, in process
        self.getVariance(threshold=(.95 * (1 - .95)))
      File "/home/users/MyUsername/MyProject/src/main/python/classes/featureselector.py", line 136, in getVariance
        self.removeLowVarianceColumns(df=self.X, thresh=threshold)
      File "/home/users/MyUsername/MyProject/src/main/python/classes/featureselector.py", line 213, in removeLowVarianceColumns
        selector.fit_transform(df)
      File "/usr/lib64/python2.7/site-packages/sklearn/base.py", line 494, in fit_transform
        return self.fit(X, **fit_params).transform(X)
      File "/usr/lib64/python2.7/site-packages/sklearn/feature_selection/variance_threshold.py", line 64, in fit
        X = check_array(X, ('csr', 'csc'), dtype=np.float64)
    File "/usr/lib64/python2.7/site-packages/sklearn/utils/validation.py", line 407, in check_array
        _assert_all_finite(array)
    File "/usr/lib64/python2.7/site-packages/sklearn/utils/validation.py", line 58, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
    ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

So, I am not sure what to check, this does not seem like a missing value issue, but I am also not able to get what columns/values are causing the problem. 因此,我不确定要检查什么,这似乎不是缺少值的问题,但是我也无法获取导致问题的列/值。

I've seen several threads here that all end in having a missing value, but that does not seem to be the problem here. 我在这里看到几个线程都以缺少值结尾,但这似乎不是问题所在。

I solved this by casting my data to numeric. 我通过将数据转换为数字来解决此问题。 It appears that, although the error message states 'float64', my data was all objects only and objects did not work well with fit_transform. 看起来,尽管错误消息显示为“ float64”,但我的数据仅是所有对象,而对象与fit_transform不能很好地配合使用。

Changing my data to float using: df = df.apply(lambda x: pd.to_numeric(x,errors='ignore')) solved the issue. 使用df = df.apply(lambda x: pd.to_numeric(x,errors='ignore'))将我的数据更改为浮动。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ValueError:使用KNeighborsRegressor的拟合,输入包含NaN,无穷大或对于dtype(&#39;float64&#39;)而言太大的值 - ValueError: Input contains NaN, infinity or a value too large for dtype('float64') using fit from KNeighborsRegressor ValueError:输入包含 NaN、无穷大或对于 dtype(&#39;float64&#39;)- km.fit(x) 来说太大的值 - ValueError: Input contains NaN, infinity or a value too large for dtype('float64')- km.fit(x) 输入包含NaN,无穷大或因dtype(&#39;float64&#39;)错误而过大的值,但数据集中无值 - Input contains NaN, infinity or a value too large for dtype('float64') error but no values in dataset 如何修复 ValueError:输入包含 NaN、无穷大或对于 dtype(&#39;float64&#39;) 来说太大的值。 错误 - How to fix ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). Error sklearn错误ValueError:输入包含NaN,无穷大或对于dtype(&#39;float64&#39;)来说太大的值 - sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64') ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。 如何处理这个错误? - ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). How to handle this error? 得到错误:输入包含NaN,无穷大或dtype值太大(&#39;float64&#39;) - got error:Input contains NaN, infinity or a value too large for dtype('float64') 如何解决错误:输入包含 NaN、无穷大或值对于 dtype(&#39;float64&#39;).? - How to solve the error : Input contains NaN, infinity or a value too large for dtype('float64').? Python 错误帮助:“ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 而言太大的值。” - Python error help: “ValueError: Input contains NaN, infinity or a value too large for dtype('float64').” ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。 拟合误差机器学习 - ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). FITTING ERROR MACHINE LERNING
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM