简体   繁体   English

如何在列表列表中找到元素总是重复的公共位置,然后删除这些重复项?

[英]How to find common positions in list of lists where the elements are always duplicates and then remove those duplicates?

I have a list of lists, where the lists are always ordered in the same way, and within each list several of the elements are duplicates.我有一个列表列表,其中列表总是以相同的方式排序,并且在每个列表中有几个元素是重复的。 I would therefore like to remove duplicates from the list, but it's important that I retain the structure of each list ie if elements indices 0, 1 and 2 are all duplicates for a given list, two of these would be removed from the list, but then the same positions elements would also have to be removed from all the other lists too to retain the ordered structure.因此,我想从列表中删除重复项,但重要的是我保留每个列表的结构,即如果元素索引 0、1 和 2 都是给定列表的重复项,则其中两个将从列表中删除,但是那么相同的位置元素也必须从所有其他列表中删除,以保留有序结构。

Crucially however, it may not be the case that elements with indices 0, 1 and 2 are duplicates in the other lists, and therefore I would only want to do this if I was sure that across the lists, elements indexed by 0, 1 and 2 were always duplicated.然而,至关重要的是,索引为 0、1 和 2 的元素在其他列表中可能不是重复的,因此我只想在确定整个列表中索引为 0、1 和 2 的元素时才这样做2总是重复的。

As an example, say I had this list of lists例如,假设我有这个列表列表

L = [ [1,1,1,3,3,2,4,6,6], 
[5,5,5,4,5,6,5,7,7], 
[9,9,9,2,2,7,8,10,10] ]

After applying my method I would like to be left with应用我的方法后,我想留下

L_new = [ [1,3,3,2,4,6], 
[5,4,5,6,5,7], 
[9,2,2,7,8,10] ]

where you see that elements index 1 and 2 and element 8 have all been constantly removed because they are consistently duplicated across all lists, whereas elements index 3 and 4 have not because they are not always duplicated.您会看到元素索引 1 和 2 以及元素 8 都被不断删除,因为它们在所有列表中始终重复,而元素索引 3 和 4 没有,因为它们并不总是重复的。

My thinking so far (though I believe this is probably not the best approach and why I asked for help)到目前为止我的想法(尽管我认为这可能不是最好的方法以及我寻求帮助的原因)

def check_duplicates_in_same_position(arr_list):
    check_list = []
    for arr in arr_list:
        duplicate_positions_list = []
        positions = {}
        for i in range(len(arr)):
            item = arr[i]
            if item in positions:
                positions[item].append(i)
            else:
                positions[item] = [i]
        duplicate_positions = {k: v for k, v in positions.items() if len(v) > 1}
        for _, item in duplicate_positions.items():
            duplicate_positions_list.append(item)
        check_list.append(duplicate_positions_list)
    
    return check_list

This returns a list of lists of lists, where each element is a list that contains a bunch of lists whose elements are the indices of the duplicates for that list as so这将返回列表列表的列表,其中每个元素都是一个列表,其中包含一堆列表,这些列表的元素是该列表的重复项的索引,因此

[[[0, 1, 2], [3, 4], [7, 8]],
 [[0, 1, 2, 4, 6], [7, 8]],
 [[0, 1, 2], [3, 4], [7, 8]]]

I then thought to somehow compare these lists and for example remove elements index 1 and 2 and index 8, because these are common matches for each.然后我想以某种方式比较这些列表,例如删除元素索引 1 和 2 以及索引 8,因为这些是每个列表的常见匹配项。

Assuming all sub-lists will have the same length, this should work:假设所有子列表的长度都相同,这应该可行:

l = [ [1,1,1,3,3,2,4,6,6], [5,5,5,4,5,6,5,7,7], [9,9,9,2,2,7,8,10,10] ]

[list(x) for x in zip(*dict.fromkeys(zip(*l)))]

# Output: [[1, 3, 3, 2, 4, 6], [5, 4, 5, 6, 5, 7], [9, 2, 2, 7, 8, 10]]

Explanation:解释:

  1. zip(*l) - This will create a new 1-dimension array. zip(*l) - 这将创建一个新的一维数组。 The nth element will be a tuple with all the nth elements in the original sublists:第 n 个元素将是一个元组,其中包含原始子列表中的所有第 n 个元素:
[(1, 5, 9),
 (1, 5, 9),
 (1, 5, 9),
 (3, 4, 2),
 (3, 5, 2),
 (2, 6, 7),
 (4, 5, 8),
 (6, 7, 10),
 (6, 7, 10)]
  1. From the previous list, we only want to keep those that are not repeated.从前面的列表中,我们只想保留那些不重复的。 There are various ways of achieving this.有多种方法可以实现这一点。 If you search how to remove duplicates while mantaining order, this answer will pop up.如果您搜索如何在维护订单的同时删除重复项,则会弹出此答案 It uses dict.fromkeys(<list>) .它使用dict.fromkeys(<list>) Since python dict keys must be unique, this removes duplicates and generates the following output:由于 python 字典键必须是唯一的,这将删除重复项并生成以下 output:
{(1, 5, 9): None,
 (3, 4, 2): None,
 (3, 5, 2): None,
 (2, 6, 7): None,
 (4, 5, 8): None,
 (6, 7, 10): None}
  1. We now want to unzip those keys to the original 2-dimensional array.我们现在要将这些键解压缩到原始的二维数组中。 For that, we can use zip again:为此,我们可以再次使用zip
zip(*dict.fromkeys(zip(*l)))
  1. Since zip returns tuples, we have to finally convert the tuples to list using a list comprehension:由于 zip 返回元组,我们最终必须使用列表理解将元组转换为列表:
[list(x) for x in zip(*dict.fromkeys(zip(*l)))]

I would go with something like this.我会用这样的东西 go。 It is not too fast, but dependent on the size of your lists, it could be sufficient.它不是太快,但取决于列表的大小,它可能就足够了。

L = [ [1,1,1,3,3,2,4,6,6], [5,5,5,4,5,6,5,7,7], [9,9,9,2,2,7,8,10,10] ]

azip = zip(*L)
temp_L = []
for zz in azip:
    if not zz in temp_L:
        temp_L.append(zz)
new_L = [list(zip(*temp_L))[zz] for zz in range(len(L))]

first, we zip the three (or more) lists within L. Then, we iterate over each element, check if it already exists.首先,我们 zip L 中的三个(或更多)列表。然后,我们遍历每个元素,检查它是否已经存在。 If not, we add it to our temporary list temp_L.如果不是,我们将它添加到我们的临时列表 temp_L。 And in the end we restructure temp_L to be of the original format.最后我们将 temp_L 重组为原始格式。 It returns它返回

new_L
>> [(1, 3, 3, 2, 4, 6), (5, 4, 5, 6, 5, 7), (9, 2, 2, 7, 8, 10)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM