简体   繁体   English

StandardScaler -ValueError:输入包含NaN,无穷大或对于dtype('float64')而言太大的值

[英]StandardScaler -ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I have the following code 我有以下代码

X = df_X.as_matrix(header[1:col_num])
scaler = preprocessing.StandardScaler().fit(X)
X_nor = scaler.transform(X) 

And got the following errors: 并得到以下错误:

  File "/Users/edamame/Library/python_virenv/lib/python2.7/site-packages/sklearn/utils/validation.py", line 54, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I used: 我用了:

print(np.isinf(X))
print(np.isnan(X))

which gives me the output below. 这给了我下面的输出。 This couldn't really tell me which element has issue as I have millions of rows. 因为我有数百万行,所以这不能真正告诉我哪个元素有问题。

[[False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]
 ..., 
 [False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]]

Is there a way to identify which value in the matrix X actually cause the problem? 有没有办法确定矩阵X中的哪个值实际上导致了问题? How do people avoid it in general? 人们一般如何避免使用它?

numpy contains various logical element-wise tests for this sort of thing. numpy包含针对此类事物的各种逻辑元素测试。

In your particular case, you will want to use isinf and isnan . 在您的特定情况下,您将要使用isinfisnan

In response to your edit: 回应您的编辑:

You can pass the result of np.isinf() or np.isnan() to np.where(), which will return the indices where a condition is true. 您可以将np.isinf()或np.isnan()的结果传递给np.where(),这将返回条件为true的索引。 Here's a quick example: 这是一个简单的示例:

import numpy as np

test = np.array([0.1, 0.3, float("Inf"), 0.2])

bad_indices = np.where(np.isinf(test))

print(bad_indices)

You can then use those indices to replace the content of the array: 然后,您可以使用这些索引来替换数组的内容:

test[bad_indices] = -1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 KNN ValueError:输入包含NaN,无穷大或对于dtype('float64')而言太大的值 - KNN ValueError: Input contains NaN, infinity or a value too large for dtype('float64') 如何解决ValueError:输入包含NaN,无穷大或对于dtype来说太大的值('float64') - How to resolve ValueError: Input contains NaN, infinity or a value too large for dtype('float64') 如何修复 ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。 错误 - How to fix ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). Error sklearn错误ValueError:输入包含NaN,无穷大或对于dtype('float64')来说太大的值 - sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64') ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。 如何处理这个错误? - ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). How to handle this error? ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。 sklearn - ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). sklearn ValueError: 输入包含 NaN、无穷大或对于使用 LinearRegression 的 dtype('float64') 来说太大的值 - ValueError: Input contains NaN, infinity or a value too large for dtype('float64') using LinearRegression SVM ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值 - SVM ValueError: Input contains NaN, infinity or a value too large for dtype('float64') ValueError:使用KNeighborsRegressor的拟合,输入包含NaN,无穷大或对于dtype('float64')而言太大的值 - ValueError: Input contains NaN, infinity or a value too large for dtype('float64') using fit from KNeighborsRegressor ValueError:尝试规范化数据时,输入包含NaN,无穷大或对于dtype('float64')而言太大的值 - ValueError: Input contains NaN, infinity or a value too large for dtype('float64') when trying to normalize data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM