I have an array as following:
In [1]: x = array(['1.2', '2.3', '1.2.3'])
I want to test if each element in the array can be converted into numerical value. That is, a function: is_numeric(x) will return a True/False array as following:
In [2]: is_numeric(x)
Out[2]: array([True, True, False])
How to do this?
import numpy as np
def is_float(val):
try:
float(val)
except ValueError:
return False
else:
return True
a = np.array(['1.2', '2.3', '1.2.3'])
is_numeric_1 = lambda x: map(is_float, x) # return python list
is_numeric_2 = lambda x: np.array(map(is_float, x)) # return numpy array
is_numeric_3 = np.vectorize(is_float, otypes = [bool]) # return numpy array
Depend on the size of a array and the type of the returned values, these functions have different speed.
In [26]: %timeit is_numeric_1(a)
100000 loops, best of 3: 2.34 µs per loop
In [27]: %timeit is_numeric_2(a)
100000 loops, best of 3: 3.13 µs per loop
In [28]: %timeit is_numeric_3(a)
100000 loops, best of 3: 6.7 µs per loop
In [29]: a = np.array(['1.2', '2.3', '1.2.3']*1000)
In [30]: %timeit is_numeric_1(a)
1000 loops, best of 3: 1.53 ms per loop
In [31]: %timeit is_numeric_2(a)
1000 loops, best of 3: 1.6 ms per loop
In [32]: %timeit is_numeric_3(a)
1000 loops, best of 3: 1.58 ms per loop
If list
is okay, use is_numeric_1
.
If you want a numpy array
, and size of a is small, use is_numeric_2
.
Else, use is_numeric_3
In [23]: x = np.array(['1.2', '2.3', '1.2.3', '1.2', 'foo'])
Trying to convert the whole array to float
, results in an error if one or more strings can't be converted:
In [24]: x.astype(float)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-24-a68fda2cafea> in <module>()
----> 1 x.astype(float)
ValueError: could not convert string to float: '1.2.3'
In [25]: x[:2].astype(float)
Out[25]: array([ 1.2, 2.3])
But to find which ones can be converted, and which can't, we probably have to apply a test to each element. That requires some sort of iteration, and some sort of test.
Most of these answers have wrapped float
in a try/except
block. But look at How do I check if a string is a number (float) in Python? for alternatives. One answer found that the float wrap was fast for valid inputs, but a regex test was faster for invalid ones ( https://stackoverflow.com/a/25299619/901925 ).
In [30]: def isnumeric(s):
try:
float(s)
return True
except ValueError:
return False
In [31]: [isnumeric(s) for s in x]
Out[31]: [True, True, False, True, False]
In [32]: np.array([isnumeric(s) for s in x]) # for array
Out[32]: array([ True, True, False, True, False], dtype=bool)
I like list comprehension because it is common and clear (and preferred in Py3). For speed I have found that frompyfunc
has a modest advantage over other iterators (and handles multidimensional arrays):
In [34]: np.frompyfunc(isnumeric, 1,1)(x)
Out[34]: array([True, True, False, True, False], dtype=object)
In [35]: np.frompyfunc(isnumeric, 1,1)(x).astype(bool)
Out[35]: array([ True, True, False, True, False], dtype=bool)
It requires a bit more boilerplate than vectorize
, but is usually faster. But if the array or list is small, list comprehension is usually faster (avoiding numpy overhead).
======================
(edited) np.char
has a set of functions that apply string methods to the elements of an array. But the closest function is np.char.isnumeric
which just tests for numeric characters, not a full float conversion.
# method to check whether a string is a float
def is_numeric(s):
try:
float(s)
return True
except ValueError:
return False
# method to return an array of booleans that dictate whether a string can be parsed into a number
def is_numeric_array(arr):
return_array = []
for val in numpy.ndenumerate(arr):
return_array.append(is_numeric(val))
return return_array
This also relies on the try-except method of getting the per-element result, but using fromiter
pre-allocs the boolean result array:
def is_numeric(x):
def try_float(xx):
try:
float(xx)
except ValueError:
return False
else:
return True
return fromiter((try_float(xx) for xx in x.flat),
dtype=bool, count=x.size)
x = array(['1.2', '2.3', '1.2.3'])
print is_numeric(x)
Gives:
[ True True False]
I find the following works well for my purpose.
First, save the isNumeric function fromhttps://rosettacode.org/wiki/Determine_if_a_string_is_numeric#C in a file called ctest.h, then create a .pyx file as follows:
from numpy cimport ndarray, uint8_t
import numpy as np
cimport numpy as np
cdef extern from "ctest.h":
int isNumeric(const char * s)
def is_numeric_elementwise(ndarray x):
cdef Py_ssize_t i
cdef ndarray[uint8_t, mode='c', cast=True] y = np.empty_like(x, dtype=np.uint8)
for i in range(x.size):
y[i] = isNumeric(x[i])
return y > 0
The above Cython function runs quite fast.
In [4]: is_numeric_elementwise(array(['1.2', '2.3', '1.2.3']))
Out[4]: array([ True, True, False], dtype=bool)
In [5]: %timeit is_numeric_elementwise(array(['1.2', '2.3', '1.2.3'] * 1000000))
1 loops, best of 3: 695 ms per loop
Compare with is_numeric_3 method in https://stackoverflow.com/a/37997673/4909242 , it is ~5 times faster.
In [6]: %timeit is_numeric_3(array(['1.2', '2.3', '1.2.3'] * 1000000))
1 loops, best of 3: 3.45 s per loop
There might still be some rooms to improve, I guess.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.