简体   繁体   English

比较数组列表中的元素

[英]Comparing elements of a list of arrays

I have a big list that looks something like this (but way bigger) 我有一个很大的清单,看起来像这样(但更大)

lista = [(array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.]))]

You can't see it in the little extract I put but there are some elements that repeat. 在我输入的小部分摘录中看不到它,但是有些元素重复出现。 I need for the duplicates to go away. 我需要复制品消失。

I have tried doing 我尝试做

newlist = []
for a in lista:
    if np.all(a not in newlist):
         newlist.append(a)

But it doesnt work and it returns 但它不起作用,它会返回

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Don't understand why it doesn't work. 不明白为什么它不起作用。 Need to compare each element on my list as the list of arrays they are. 需要将列表中的每个元素作为它们的数组列表进行比较。

edit: a duplicate can be any element of the list. 编辑:重复项可以是列表的任何元素。 It's a duplicate if one element, a tuple, shares the exact same arrays, in the same order, as another tuple. 如果一个元素(一个元组)与另一个元组以相同的顺序共享完全相同的数组,则表示重复。

(array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]))

Your definition of a duplicate: a tuple with exactly the same arrays repeated in the list. 您对重复项的定义:具有与列表中重复的数组完全相同的数组的元组。

import numpy as np

# list with the 5th tuple being a duplicate of the 1st
lista = [(array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
         (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.])),
         (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  0.])),
         (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  1.])),
         (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
         (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
         (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.]))]

clean_list = []

for t in lista:
    for ut in clean_list:
        if all(np.all(t[i] == ut[i]) for i in range(len(t)))):
            # duplicate, discard it
            break
    else:
        # does not exist, keep it
        clean_list.append(t)

The result clean_list will contain all tuples except the 5th tuple which is a duplicate of the 1st. 结果clean_list将包含除第五个元组(与第一个元组重复)之外的所有元组。

Note that this example uses Python's built-in all function for checking the truth of all conditions passed to it and numpy.all for checking the equality of all elements in the compared arrays. 请注意,此示例使用Python内置的all函数检查传递给它的所有条件的真相,并使用numpy.all检查比较数组中所有元素的相等性。

I am not quite sure you are using the method "all" properly, though I am not familiar with it. 我不太确定您是否正确使用“全部”方法,尽管我对此并不熟悉。 Here's a reference: https://docs.scipy.org/doc/numpy/reference/generated/numpy.all.html 这里是参考: https : //docs.scipy.org/doc/numpy/reference/generated/numpy.all.html

Couldn't you just do?: 你不能做吗?:

newlist = []
for a in lista:
    if a not in newlist:
         newlist.append(a)

When you use in , python iterates the list and checks for equality with your element. 当在中使用in ,python会迭代列表并检查与元素的相等性。 However, equality check will yield something like [True, False, True] , which is neither true none false. 但是,相等检查将产生类似[True, False, True] ,这既不是真的也不是假的。 So you can't use the in operator, but you can simulate it with the check below 因此,您不能使用in运算符,但可以通过以下检查进行模拟

newlist = []

for line in lista:
    for item in line:
        if not any(all(item == value) for value in newlist):
            newlist.append(item)

Essentially, with all(item==value) for value in list) you're simulating the behavior of in , and with not any , you ask that none of the checks should be true/ 本质上, all(item==value) for value in list)您是在模拟in的行为,而对于not any ,则是在要求所有检查都不为真/

import numpy
from numpy import array

lista = [(array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 0.,  0.,  0.]), array([ 0.,  1.,  0.]), array([ 0.,  0.,  1.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  0.])),
 (array([ 1.,  0.,  0.]), array([ 0.,  0.,  0.]), array([ 0.,  0.,  1.]))]

y = []

for a in lista:
    found = 0
    for b in y:
        if found >= 3:
            break
        found = 0
        for i in range(0,3):
            if False in numpy.in1d(a[i], b[i]):
                break
            found += 1
    if found < 3:
        y.append(a)

for a in y:
    print a

The code goes through each of the elements in the list and compares the elements to the tuples using numpy.in1d(a, b) . 代码遍历列表中的每个元素,并使用numpy.in1d(a, b)将这些元素与元组进行比较。 If the 3 elements match consequently, the item is a duplicate. 如果这三个元素匹配,则该项目为重复项。 Otherwise, it's added to y . 否则,将其添加到y

Assuming all rows have equal length, the following will solve your problem: 假设所有行的长度相等,以下将解决您的问题:

import numpy_indexed as npi
npi.unique(lista)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM