简体   繁体   中英

StandardScaler -ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I have the following code

X = df_X.as_matrix(header[1:col_num])
scaler = preprocessing.StandardScaler().fit(X)
X_nor = scaler.transform(X) 

And got the following errors:

  File "/Users/edamame/Library/python_virenv/lib/python2.7/site-packages/sklearn/utils/validation.py", line 54, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I used:

print(np.isinf(X))
print(np.isnan(X))

which gives me the output below. This couldn't really tell me which element has issue as I have millions of rows.

[[False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]
 ..., 
 [False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]]

Is there a way to identify which value in the matrix X actually cause the problem? How do people avoid it in general?

numpy contains various logical element-wise tests for this sort of thing.

In your particular case, you will want to use isinf and isnan .

In response to your edit:

You can pass the result of np.isinf() or np.isnan() to np.where(), which will return the indices where a condition is true. Here's a quick example:

import numpy as np

test = np.array([0.1, 0.3, float("Inf"), 0.2])

bad_indices = np.where(np.isinf(test))

print(bad_indices)

You can then use those indices to replace the content of the array:

test[bad_indices] = -1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM