简体   繁体   中英

How to calculate the difference between the elements in three lists efficiently?

I have 3 very large lists of strings, for visualization purposes consider:

A = ['one','four', 'nine']

B = ['three','four','six','five']

C = ['four','five','one','eleven']

How can I calculate the difference between this lists in order to get only the elements that are not repeating in the other lists. For example:

A = ['nine']

B = ['three','six']

C = ['eleven']

Method 1

You can arbitrarily add more lists just by changing the first line, eg my_lists = (A, B, C, D, E) .

my_lists = (A, B, C)
my_sets = {n: set(my_list) for n, my_list in enumerate(my_lists)}
my_unique_lists = tuple(
    list(my_sets[n].difference(*(my_sets[i] for i in range(len(my_sets)) if i != n))) 
    for n in range(len(my_sets)))

>>> my_unique_lists
(['nine'], ['six', 'three'], ['eleven'])

my_sets uses a dictionary comprehension to create sets for each of the lists. The key to the dictionary is the lists order ranking in my_lists .

Each set is then differenced with all other sets in the dictionary (barring itself) and then converted back to a list.

The ordering of my_unique_lists corresponds to the ordering in my_lists .

Method 2

You can use Counter to get all unique items (ie those that only appear in just one list and not the others), and then use a list comprehension to iterate through each list and select those that are unique.

from collections import Counter

c = Counter([item for my_list in my_lists for item in set(my_list)])
unique_items = tuple(item for item, count in c.items() if count == 1)

>>> tuple([item for item in my_list if item in unique_items] for my_list in my_lists)
(['nine'], ['three', 'six'], ['eleven'])

With sets:

  • convert all lists to sets
  • take the differences
  • convert back to lists

A, B, C = map(set, (A, B, C))
a = A - B - C
b = B - A - C
c = C - A - B
A, B, C = map(list, (a, b, c))

The (possible) problem with this is that the final lists are no longer ordered, eg

>>> A
['nine']
>>> B
['six', 'three']
>>> C
['eleven']

This could be fixed by sorting by the original indicies, but then the time complexity will dramatically increase so the benefit of using sets is almost entirely lost.


With list-comps (for-loops):

  • convert lists to sets
  • use list-comps to filter out elements from the original lists that are not in the other sets

sA, sB, sC = map(set, (A, B, C))
A = [e for e in A if e not in sB and e not in sC]
B = [e for e in B if e not in sA and e not in sC]
C = [e for e in C if e not in sA and e not in sB]

which then produces a result that maintains the original order of the lists:

>>> A
['nine']
>>> B
['three', 'six']
>>> C
['eleven']

Summary:

In conclusion, if you don't care about the order of the result, convert the lists to sets and then take their differences (and not bother converting back to lists). However, if you do care about order, then still convert the lists to sets (hash tables) as then the lookup will still be faster when filtering them (best case O(1) vs O(n) for lists).

You can iteratively go thru all lists elements adding current element to set if its not there, and if its there remove it from list. This way you will use additional up to O(n) space complexity, and O(n) time complexity but elements will remain in order.

You can also use a function define purposely to check the difference between three list. Here's an example of such a function:

def three_list_difference(l1, l2, l3):
    lst = []
    for i in l1:
        if not(i in l2 or i in l3):
            lst.append(i)
    return lst

The function three_list_difference takes three list and checks if an element in the first list l1 is also in either l2 or l3 . The deference can be determined by simple calling the function in the right configuration:

three_list_difference(A, B, C)
three_list_difference(B, A, C)
three_list_difference(C, B, A)

with outputs:

['nine']
['three', 'six']
['eleven']

Using a function is advantageous because the code is reusable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM