简体   繁体   English

如何有效计算三个列表中元素之间的差异?

[英]How to calculate the difference between the elements in three lists efficiently?

I have 3 very large lists of strings, for visualization purposes consider: 我有3个非常大的字符串列表,出于可视化目的考虑:

A = ['one','four', 'nine']

B = ['three','four','six','five']

C = ['four','five','one','eleven']

How can I calculate the difference between this lists in order to get only the elements that are not repeating in the other lists. 我如何计算此列表之间的差异,以便仅获取其他列表中未重复的元素。 For example: 例如:

A = ['nine']

B = ['three','six']

C = ['eleven']

Method 1 方法1

You can arbitrarily add more lists just by changing the first line, eg my_lists = (A, B, C, D, E) . 您只需更改第一行即可任意添加更多列表,例如my_lists = (A, B, C, D, E)

my_lists = (A, B, C)
my_sets = {n: set(my_list) for n, my_list in enumerate(my_lists)}
my_unique_lists = tuple(
    list(my_sets[n].difference(*(my_sets[i] for i in range(len(my_sets)) if i != n))) 
    for n in range(len(my_sets)))

>>> my_unique_lists
(['nine'], ['six', 'three'], ['eleven'])

my_sets uses a dictionary comprehension to create sets for each of the lists. my_sets使用字典理解来为每个列表创建集合。 The key to the dictionary is the lists order ranking in my_lists . 字典的关键是my_lists的列表顺序排名。

Each set is then differenced with all other sets in the dictionary (barring itself) and then converted back to a list. 然后将每个集合与字典中的所有其他集合区别(禁止),然后转换回列表。

The ordering of my_unique_lists corresponds to the ordering in my_lists . 的排序my_unique_lists对应于排序my_lists

Method 2 方法2

You can use Counter to get all unique items (ie those that only appear in just one list and not the others), and then use a list comprehension to iterate through each list and select those that are unique. 您可以使用Counter获得所有唯一项(即仅出现在一个列表中而不显示在其他列表中的项),然后使用列表推导来遍历每个列表并选择唯一的项。

from collections import Counter

c = Counter([item for my_list in my_lists for item in set(my_list)])
unique_items = tuple(item for item, count in c.items() if count == 1)

>>> tuple([item for item in my_list if item in unique_items] for my_list in my_lists)
(['nine'], ['three', 'six'], ['eleven'])

With sets: 带套:

  • convert all lists to sets 将所有列表转换为集合
  • take the differences 采取差异
  • convert back to lists 转换回列表

A, B, C = map(set, (A, B, C))
a = A - B - C
b = B - A - C
c = C - A - B
A, B, C = map(list, (a, b, c))

The (possible) problem with this is that the final lists are no longer ordered, eg (可能的)问题是最终列表不再排序,例如

>>> A
['nine']
>>> B
['six', 'three']
>>> C
['eleven']

This could be fixed by sorting by the original indicies, but then the time complexity will dramatically increase so the benefit of using sets is almost entirely lost. 可以通过按原始索引进行排序来解决此问题,但是时间复杂度将急剧增加,因此使用集合的好处几乎完全丧失了。


With list-comps (for-loops): 使用list-comps(for循环):

  • convert lists to sets 将列表转换为集合
  • use list-comps to filter out elements from the original lists that are not in the other sets 使用list-comps从原始列表中筛选出不在其他集合中的元素

sA, sB, sC = map(set, (A, B, C))
A = [e for e in A if e not in sB and e not in sC]
B = [e for e in B if e not in sA and e not in sC]
C = [e for e in C if e not in sA and e not in sB]

which then produces a result that maintains the original order of the lists: 然后产生一个保持列表原始顺序的结果:

>>> A
['nine']
>>> B
['three', 'six']
>>> C
['eleven']

Summary: 摘要:

In conclusion, if you don't care about the order of the result, convert the lists to sets and then take their differences (and not bother converting back to lists). 总之,如果您不关心结果的顺序,可以将列表转换为集合,然后采用它们之间的差异(而不用费心转换回列​​表)。 However, if you do care about order, then still convert the lists to sets (hash tables) as then the lookup will still be faster when filtering them (best case O(1) vs O(n) for lists). 但是,如果您确实关心顺序,则仍将列表转换为集合(哈希表),因为过滤它们时查找仍会更快(列表的最佳情况为O(1)O(n) )。

You can iteratively go thru all lists elements adding current element to set if its not there, and if its there remove it from list. 您可以遍历所有列表元素,并添加当前元素以设置当前元素(如果不存在),以及将其从列表中删除。 This way you will use additional up to O(n) space complexity, and O(n) time complexity but elements will remain in order. 这样,您将使用高达O(n)的空间复杂度和O(n)的时间复杂度,但元素将保持有序。

You can also use a function define purposely to check the difference between three list. 您也可以使用功能定义来检查三个列表之间的差异。 Here's an example of such a function: 这是此类函数的示例:

def three_list_difference(l1, l2, l3):
    lst = []
    for i in l1:
        if not(i in l2 or i in l3):
            lst.append(i)
    return lst

The function three_list_difference takes three list and checks if an element in the first list l1 is also in either l2 or l3 . 函数three_list_difference获取三个列表,并检查第一个列表l1的元素是否也在l2l3 The deference can be determined by simple calling the function in the right configuration: 可以通过在正确的配置中简单调用该函数来确定是否遵循:

three_list_difference(A, B, C)
three_list_difference(B, A, C)
three_list_difference(C, B, A)

with outputs: 输出:

['nine']
['three', 'six']
['eleven']

Using a function is advantageous because the code is reusable. 使用函数是有利的,因为代码是可重用的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM