简体   繁体   中英

Numpy: Check if float array contains whole numbers

In Python, it is possible to check if a float contains an integer value using n.is_integer() , based on this QA: How to check if a float value is a whole number .

Does numpy have a similar operation that can be applied to arrays? Something that would allow the following:

>>> x = np.array([1.0 2.1 3.0 3.9])
>>> mask = np.is_integer(x)
>>> mask
array([True, False, True, False], dtype=bool)

It is possible to do something like

>>> mask = (x == np.floor(x))

or

>>> mask = (x == np.round(x))

but they involve calling extra methods and creating a bunch of temp arrays that could be potentially avoided.

Does numpy have a vectorized function that checks for fractional parts of floats in a way similar to Python's float.is_integer ?

From what I can tell, there is no such function that returns a boolean array indicating whether floats have a fractional part or not. The closest I can find is np.modf which returns the fractional and integer parts, but that creates two float arrays (at least temporarily), so it might not be best memory-wise.

If you're happy working in place, you can try something like:

>>> np.mod(x, 1, out=x)
>>> mask = (x == 0)

This should save memory versus using round or floor (where you have to keep x around), but of course you lose the original x .

The other option is to ask for it to be implemented in Numpy, or implement it yourself.

I needed an answer to this question for a slightly different reason: checking when I can convert an entire array of floating point numbers to integers without losing data.

Hunse's answer almost works for me, except that I obviously can't use the in-place trick, since I need to be able to undo the operation:

if np.all(np.mod(x, 1) == 0):
    x = x.astype(int)

From there, I thought of the following option which probably is faster in many situations:

x_int = x.astype(int)
if np.all((x - x_int) == 0):
    x = x_int

The reason is that the modulo operation is slower than subtraction. However, now we do the casting to integers up-front - I don't know how fast that operation is, relatively speaking. But if most of your arrays are integers (they are in my case), the latter version is almost certainly faster.

Another benefit is that you could replace the subraction with something like np.isclose to check within a certain tolerance (of course you should be careful here, since truncation is not proper rounding!).

x_int = x.astype(int)
if np.all(np.isclose(x, x_int, 0.0001)):
    x = x_int

EDIT: Slower, but perhaps worth it depending on your use-case, is also converting integers individually if present.

x_int = x.astype(int)
safe_conversion = (x - x_int) == 0
# if we can convert the whole array to integers, do that
if np.all(safe_conversion):
    x = x_int.tolist()
else:
    x  = x.tolist()
    # if there are _some_ integers, convert them
    if np.any(safe_conversion):
        for i in range(len(x)):
            if safe_conversion[i]:
                x[i] = int(x[i])

As an example of where this matters: this works out for me, because I have sparse data (which means mostly zeros) which I then convert to JSON, once, and reuse later on a server. For floats, ujson converts those as [ ...,0.0,0.0,0.0,... ] , and for ints that results in [...,0,0,0,...] , saving up to half the numbers of characters in the string. This reduces overhead on both the server (shorter strings) and the client (shorter strings, presumably slightly faster JSON parsing).

You can also just use the Python method in a list comprehension.

>>> x = np.array([1.0, 2.1, 3.0, 3.9])
>>> mask = np.array([val.is_integer() for val in x])
>>> mask
array([ True, False,  True, False])

Compared to the answer using mod 1 , this was slightly faster for the given example with 4 values (5.66 us vs 8.03 us) and over 3x faster for an array of 1000 values.

Inspired by the accepted answer, here's a non-inplace version using the % operator:

modulus = x % 1
mask = modulus == 0

or more succinctly

mask = (x % 1) == 0

While the accepted method of (x % 1) == 0 is quite adequate, it bothers me that there is no way to accomplish this natively in numpy, especially given the existence of float.is_integer in vanilla python.

I therefore did a bit of research on the floating point formats supported by numpy ( float16 , float32 , float64 , float128 (acutally extended precision )), and on how to write a ufunc .

The result is that for floats small enough to fit into a corresponding unsigned integer type (pretty much everything up to float64 on a normal machine), you can do the checks with some simple bit twiddling. For example, here is a C99 function that very quickly tells you if your float32 contains an integer value:

#include <stdint.h>

int isint_float(float n)
{
    uint32_t k = ((union { float n; uint32_t k; }){n}).k;

    // Zero when everything except sign bit is zero
    if((k & 0x7FFFFFFF) == 0) return 1;

    uint32_t exponent = k & 0x7F800000;

    // NaN or Inf when the exponent bits are all ones
    // Guaranteed fraction when exponent < 0
    if(exponent == 0x7F800000 || exponent < 0x3F800000) return 0;
    // Guaranteed integer when exponent >= FLT_MANT_DIG - 1
    if(exponent >= 0x4B000000) return 1;
    // Otherwise, check that the significand bits past the exponent are zeros
    return (k & (0x7FFFFF >> ((exponent >> 23) - 0x7F))) == 0;
}

I went ahead and wrapped this function and its siblings in a ufunc, which can be found here: https://github.com/madphysicist/isint_ufunc . One nice feature is that this ufunc returns True for all integer types instead of raising an error. Another is that it runs anywhere from 5x to 15x faster than (x % 1) == 0 .

Based on the linked tutorial, you can install with python setup.py {build_ext --inplace, build, install} , depending on how bad you want it. Perhaps I should see if the numpy community is interested in including this ufunc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM