简体   繁体   中英

Remove common elements in numpy array

I am trying to do a union of two numpy arrays in the following manner

np.union1d( np.arange(0.1, 0.91, 0.1), np.arange(0.4, 0.81, 0.01)  )

The output reads:

array([ 0.1 ,  0.2 ,  0.3 ,  0.4 ,  0.41,  0.42,  0.43,  0.44,  0.45,
    0.46,  0.47,  0.48,  0.49,  0.5 ,  0.5 ,  0.51,  0.52,  0.53,
    0.54,  0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.6 ,  0.61,
    0.62,  0.63,  0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,
    0.7 ,  0.71,  0.72,  0.73,  0.74,  0.75,  0.76,  0.77,  0.78,
    0.79,  0.8 ,  0.8 ,  0.9 ])

In the output of this union, the number 0.5 features twice. Even when I use the unique function in numpy, this replication of the number 0.5 doesn't go away. Meaning:

np.unique( np.union1d( np.arange(0.1, 0.91, 0.1), np.arange(0.4, 0.81, 0.01)  ) )

also gives the same output. What am I doing wrong? How can I correct this and get the desired output (ie have only one occurrence of the number 0.5 in my array?

Given the input array is sorted, using the same philosophy as in this post -

a[np.r_[True,~np.isclose(a[1:] , a[:-1])]]

Sample run -

In [20]: a = np.union1d( np.arange(0.1, 0.91, 0.1), np.arange(0.4, 0.81, 0.01)  )

In [21]: a
Out[21]: 
array([ 0.1 ,  0.2 ,  0.3 ,  0.4 ,  0.41,  0.42,  0.43,  0.44,  0.45,
        0.46,  0.47,  0.48,  0.49,  0.5 ,  0.5 ,  0.51,  0.52,  0.53,
        0.54,  0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.6 ,  0.61,
        0.62,  0.63,  0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,
        0.7 ,  0.71,  0.72,  0.73,  0.74,  0.75,  0.76,  0.77,  0.78,
        0.79,  0.8 ,  0.8 ,  0.9 ])

In [22]: a[np.r_[True,~np.isclose(a[1:] , a[:-1])]]
Out[22]: 
array([ 0.1 ,  0.2 ,  0.3 ,  0.4 ,  0.41,  0.42,  0.43,  0.44,  0.45,
        0.46,  0.47,  0.48,  0.49,  0.5 ,  0.51,  0.52,  0.53,  0.54,
        0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.61,  0.62,  0.63,
        0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,  0.71,  0.72,
        0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ,  0.9 ])

As I have written in my comment, it will be an issue due to floating point precision and their comparison. If applicable in your particular case I would suggest working with integers and normalizing later on.

For example

x = np.union1d( np.arange(10, 91, 10), np.arange(40, 81, 1)  )
x = x/100.0

Output:

[ 0.1   0.2   0.3   0.4   0.41  0.42  0.43  0.44  0.45  0.46  0.47  0.48
  0.49  0.5   0.51  0.52  0.53  0.54  0.55  0.56  0.57  0.58  0.59  0.6
  0.61  0.62  0.63  0.64  0.65  0.66  0.67  0.68  0.69  0.7   0.71  0.72
  0.73  0.74  0.75  0.76  0.77  0.78  0.79  0.8   0.9 ]

As stated by @ImNt in the comments, this might be due to floating point comparision/precision (probably they are not 0.5 in memory, but 0.500000000001)

You can make a workaround, though. You know your numbers will be at most 2 digits long. Then, you can first np.round the array before applying np.unique .

x = np.union1d( np.arange(0.1, 0.91, 0.1), np.arange(0.4, 0.81, 0.01)  )
x = np.round(x, 2) # Round 2 floating points
x = np.unique(x) 

Output:

array([ 0.1 ,  0.2 ,  0.3 ,  0.4 ,  0.41,  0.42,  0.43,  0.44,  0.45,
        0.46,  0.47,  0.48,  0.49,  0.5 ,  0.51,  0.52,  0.53,  0.54,
        0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.61,  0.62,  0.63,
        0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,  0.71,  0.72,
        0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ,  0.9 ])

Or you could use Fraction s:

>>> import numpy as np
>>> from fractions import Fraction
>>> np.union1d( np.arange(Fraction(1,10), Fraction(91,100), Fraction(1,10)), np.arange(Fraction(4,10), Fraction(81,100),Fraction(1,100)))
array([Fraction(1, 10), Fraction(1, 5), Fraction(3, 10), Fraction(2, 5),
       Fraction(41, 100), Fraction(21, 50), Fraction(43, 100),
       Fraction(11, 25), Fraction(9, 20), Fraction(23, 50),
       Fraction(47, 100), Fraction(12, 25), Fraction(49, 100),
       Fraction(1, 2), Fraction(51, 100), Fraction(13, 25),
       Fraction(53, 100), Fraction(27, 50), Fraction(11, 20),
       Fraction(14, 25), Fraction(57, 100), Fraction(29, 50),
       Fraction(59, 100), Fraction(3, 5), Fraction(61, 100),
       Fraction(31, 50), Fraction(63, 100), Fraction(16, 25),
       Fraction(13, 20), Fraction(33, 50), Fraction(67, 100),
       Fraction(17, 25), Fraction(69, 100), Fraction(7, 10),
       Fraction(71, 100), Fraction(18, 25), Fraction(73, 100),
       Fraction(37, 50), Fraction(3, 4), Fraction(19, 25),
       Fraction(77, 100), Fraction(39, 50), Fraction(79, 100),
       Fraction(4, 5), Fraction(9, 10)], dtype=object)
>>> _.astype(float)
array([ 0.1 ,  0.2 ,  0.3 ,  0.4 ,  0.41,  0.42,  0.43,  0.44,  0.45,
        0.46,  0.47,  0.48,  0.49,  0.5 ,  0.51,  0.52,  0.53,  0.54,
        0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.61,  0.62,  0.63,
        0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,  0.71,  0.72,
        0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ,  0.9 ])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM