简体   繁体   中英

How to change all string values in a multidimensional numpy array to NaN?

Imagine, that you have some old tables of data which is not curated and have string values at arbitrary places.

To be able to perform data analysis steps with numpy you should somehow handle these string-valued data. Eg by changing all of them to NaN.

How to do this?

import numpy

data = [
    [["text", 2], [1, 3]],
    [[1, 4], [1, 5]],
    [[8, 8], [8, 9]]
]

a = np.array(data)

def str2nan(s):
    try:
        return float(s)  # if we can convert to float do it
    except ValueError:
        return np.nan    # else return NaN

vectorized_str2nan = np.vectorize(str2nan)

a = vectorized_str2nan(a)

This way a will be:

[
    [[nan, 2], [1, 3]],
    [[1, 4], [1, 5]],
    [[8, 8], [8, 9]]
]

This way a is suitable now for numerical processing like np.average(a) etc.

The point is to use np.vectorize to convert a function to be able to work with (mutidimensional) numpy arrays smoothly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM