简体   繁体   中英

How can I compare multiple numpy arrays for equality at the same time?

I have 5 numpy arrays:

array_1 = [1,2,3]
array_2 = [4,5,6]
array_3 = [7,8,9]
array_4 = [10,11,12]
array_5 = [1,2,3]

I need to compare them all - essentially, if ANY of the 5 arrays above have the same values (and index), I need to know about it. Currently, I have something like this done:

index_array_1 = np.where(array_1 == array_2)[0]
index_array_2 = np.where(array_1 == array_3)[0]
index_array_3 = np.where(array_1 == array_4)[0]
index_array_4 = np.where(array_1 == array_5)[0]
index_array_5 = np.where(array_2 == array_3)[0]
index_array_6 = np.where(array_2 == array_4)[0]
index_array_7 = np.where(array_2 == array_5)[0]
index_array_8 = np.where(array_3 == array_4)[0]
index_array_9 = np.where(array_3 == array_5)[0]
index_array_10 = np.where(array_4 == array_5)[0]

So, in this case, only index_array_4 would return any values, because array_1 and array_5 match up. But, this clearly isn't the best way to do this. It's a lot of code, and it takes a while to run as well.

Is there something I haven't come across yet where I can essentially say "if ANY of the 5 arrays match, tell me, and also let me know which two arrays are the ones that match"?

I'd also like it to return an index array of one of the matching arrays, as well.

You can try a one-liner:

>>> from itertools import combinations
>>> [arrays for arrays in combinations([f"array_{i}" for i in range(1,6)],2) 
     if np.all(np.equal(*map(globals().get,arrays)))]

Output:

[('array_1', 'array_5')]

EXPLANATION:

>>> [f"array_{i}" for i in range(1,6)]
['array_1', 'array_2', 'array_3', 'array_4', 'array_5']

>>> list(combinations([f"array_{i}" for i in range(1,6)],2))
[('array_1', 'array_2'),
 ('array_1', 'array_3'),
 ('array_1', 'array_4'),
 ('array_1', 'array_5'),
 ('array_2', 'array_3'),
 ('array_2', 'array_4'),
 ('array_2', 'array_5'),
 ('array_3', 'array_4'),
 ('array_3', 'array_5'),
 ('array_4', 'array_5')]

Now it iterates through the combinations,

If we take the first element, ie the first iteration, rest of the steps will look like:

>>> [*map(globals().get, ('array_1', 'array_2'))]
[[1, 2, 3], [4, 5, 6]]

>>> np.all(np.equal([1, 2, 3], [4, 5, 6]))
False

EDIT:

If inside a function then try:

def bar():
    array_1 = [1, 2, 3]
    array_2 = [4, 5, 6]
    array_3 = [7, 8, 9]
    array_4 = [10, 11, 12]
    array_5 = [1, 2, 3]
    scope = locals()
    return [arrays for arrays in combinations([f"array_{i}" for i in range(1,6)],2) 
     if np.all(eval(arrays[0],scope) == eval(arrays[1],scope))]

You can do that like this:

import numpy as np

array_1 = [1, 2, 3]
array_2 = [4, 5, 6]
array_3 = [7, 8, 9]
array_4 = [10, 11, 12]
array_5 = [1, 2, 3]

# Put all arrays together
all_arrays = np.stack([array_1, array_2, array_3, array_4, array_5])
# Compare all vs all
c = np.all(all_arrays[:, np.newaxis] == all_arrays, axis=-1)
# Take only half the result to avoid self results and symmetric results
c = np.triu(c, 1)
# Get matching pairs
m = np.stack(np.where(c), axis=1)
# One row per matching pair
print(m)
# [[0 4]]

This makes more comparisons than necessary, though (eg array_1 vs array_2 and array_2 vs array_1 ). You can also use something like scipy.spatial.distance.pdist to potentially save some time:

import numpy as np
import scipy.spatial.distance

array_1 = [1, 2, 3]
array_2 = [4, 5, 6]
array_3 = [7, 8, 9]
array_4 = [10, 11, 12]
array_5 = [1, 2, 3]

# Put all arrays together
all_arrays = np.stack([array_1, array_2, array_3, array_4, array_5])
# Compute pairwise distances
d = scipy.spatial.distance.pdist(all_arrays, 'hamming')
d = scipy.spatial.distance.squareform(d)
# Get indices of pairs where it is zero
c = np.triu(d == 0, 1)
m = np.stack(np.where(c), axis=1)
print(m)
# [[0 4]]

You can use the .count() method to validate if in the array are more than one ocurrence of an array:

def compare(*arrays):
    temp = [list(x) for x in list(arrays)]

    for i in range(len(temp)):
        if temp.count(temp[i]) > 1:
            return (i,temp[i + 1:].index(temp[i]) + 1)
        else:
            return False

The fisrst line of the function generates a list of all the array used like arguments casted to list type. If in the list there are more than one i (actual iteration value), will return i and the index of the another identic array. The function needs to return this index of the another identic array with the method .index() in a range of a list without the actual i .

print(compare(array_1,array_2,array_3,array_4,array_5))

will return

(0, 4)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM