How to change all string values in a multidimensional numpy array to NaN?

Question

Imagine, that you have some old tables of data which is not curated and have string values at arbitrary places.

To be able to perform data analysis steps with numpy you should somehow handle these string-valued data. Eg by changing all of them to NaN.

How to do this?

Answer 1

import numpy

data = [
    [["text", 2], [1, 3]],
    [[1, 4], [1, 5]],
    [[8, 8], [8, 9]]
]

a = np.array(data)

def str2nan(s):
    try:
        return float(s)  # if we can convert to float do it
    except ValueError:
        return np.nan    # else return NaN

vectorized_str2nan = np.vectorize(str2nan)

a = vectorized_str2nan(a)

This way a will be:

[
    [[nan, 2], [1, 3]],
    [[1, 4], [1, 5]],
    [[8, 8], [8, 9]]
]

This way a is suitable now for numerical processing like np.average(a) etc.

The point is to use np.vectorize to convert a function to be able to work with (mutidimensional) numpy arrays smoothly.

How to change all string values in a multidimensional numpy array to NaN?

Question

1 answers

solution1
0 ACCPTED 2022-10-04 07:58:42

How to change all string values in a multidimensional numpy array to NaN?

Question

1 answers

solution1 0 ACCPTED 2022-10-04 07:58:42

solution1
0 ACCPTED 2022-10-04 07:58:42