简体   繁体   中英

Comparing elements of a list of arrays

I have a big list that looks something like this (but way bigger)

lista = [(array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.]))]

You can't see it in the little extract I put but there are some elements that repeat. I need for the duplicates to go away.

I have tried doing

newlist = []
for a in lista:
    if np.all(a not in newlist):
         newlist.append(a)

But it doesnt work and it returns

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Don't understand why it doesn't work. Need to compare each element on my list as the list of arrays they are.

edit: a duplicate can be any element of the list. It's a duplicate if one element, a tuple, shares the exact same arrays, in the same order, as another tuple.

(array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]))

Your definition of a duplicate: a tuple with exactly the same arrays repeated in the list.

import numpy as np

# list with the 5th tuple being a duplicate of the 1st
lista = [(array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
         (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.])),
         (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  0.])),
         (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  1.])),
         (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
         (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
         (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.]))]

clean_list = []

for t in lista:
    for ut in clean_list:
        if all(np.all(t[i] == ut[i]) for i in range(len(t)))):
            # duplicate, discard it
            break
    else:
        # does not exist, keep it
        clean_list.append(t)

The result clean_list will contain all tuples except the 5th tuple which is a duplicate of the 1st.

Note that this example uses Python's built-in all function for checking the truth of all conditions passed to it and numpy.all for checking the equality of all elements in the compared arrays.

I am not quite sure you are using the method "all" properly, though I am not familiar with it. Here's a reference: https://docs.scipy.org/doc/numpy/reference/generated/numpy.all.html

Couldn't you just do?:

newlist = []
for a in lista:
    if a not in newlist:
         newlist.append(a)

When you use in , python iterates the list and checks for equality with your element. However, equality check will yield something like [True, False, True] , which is neither true none false. So you can't use the in operator, but you can simulate it with the check below

newlist = []

for line in lista:
    for item in line:
        if not any(all(item == value) for value in newlist):
            newlist.append(item)

Essentially, with all(item==value) for value in list) you're simulating the behavior of in , and with not any , you ask that none of the checks should be true/

import numpy
from numpy import array

lista = [(array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.]))]

y = []

for a in lista:
    found = 0
    for b in y:
        if found >= 3:
            break
        found = 0
        for i in range(0,3):
            if False in numpy.in1d(a[i], b[i]):
                break
            found += 1
    if found < 3:
        y.append(a)

for a in y:
    print a

The code goes through each of the elements in the list and compares the elements to the tuples using numpy.in1d(a, b) . If the 3 elements match consequently, the item is a duplicate. Otherwise, it's added to y .

Assuming all rows have equal length, the following will solve your problem:

import numpy_indexed as npi
npi.unique(lista)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM