简体   繁体   中英

Efficient way of comparing multiple lists in python

I have 5 long lists with word pairs as given in the example below. Note that this could include word pair lists like [['Salad', 'Fat']] AND word pair list of lists like [['Bread', 'Oil'], ['Bread', ' Salt']]

list_1 = [ [['Salad', 'Fat']], [['Bread', 'Oil'], ['Bread', 'Salt']], [['Salt', 'Sugar'] ]
list_2 = [ [['Salad', 'Fat'], ['Salt', 'Sugar']], [['Protein', 'Soup']] ]
list_3 = [ [['Salad', ' Protein']], [['Bread', ' Oil']], [['Sugar', 'Salt'] ]
list_4 = [ [['Salad', ' Fat'], ['Salad', 'Chicken']] ]
list_5 = [ ['Sugar', 'Protein'], ['Sugar', 'Bread'] ]

Now I want to calculate the frequency of word pairs.

For example, in the above 5 lists, I should get the output as follows, where the word pairs and its frequency is shown.

output_list = [{'['Salad', 'Fat']': 3}, {['Bread', 'Oil']: 2}, {['Salt', 'Sugar']: 2, 
{['Sugar','Salt']: 1} and so on]

What is the most efficient way of doing it in python?

You could flatten all the lists. Then use Counter to count the word frequencies.

>>> import itertools
>>> from collections import Counter
>>> l = [[1,2,3],[3,4,1,5]]
>>> counts = Counter(list(itertools.chain(*l)))
>>> counts
Counter({1: 2, 3: 2, 2: 1, 4: 1, 5: 1})

NOTE: this flattening technique will work only with lists of lists. For other flattening techniques see the link provided above.

EDIT: Thanks to AChampion counts = Counter(list(itertools.chain(*l))) can be written as counts = Counter(list(itertools.chain.from_iterable(l)))

Given you have uneven nested lists this makes the code ugly, so would look to fix the input lists.

collections.Counter() is built for this kind of thing but list s are not hashable so you need to turn them into tuple s (as well as strip off the spurious spaces):

In []:
import itertools as it
from collections import Counter

list_1 = [ [['Salad', 'Fat']], [['Bread', 'Oil'], ['Bread', 'Salt']], [['Salt', 'Sugar'] ]]
list_2 = [ [['Salad', 'Fat'], ['Salt', 'Sugar']], [['Protein', 'Soup']] ]
list_3 = [ [['Salad', ' Protein']], [['Bread', ' Oil']], [['Sugar', 'Salt'] ]]
list_4 = [ [['Salad', ' Fat'], ['Salad', 'Chicken']] ]
list_5 = [ ['Sugar', 'Protein'], ['Sugar', 'Bread']] 

t = lambda x: tuple(map(str.strip, x))
c = Counter(map(t, it.chain.from_iterable(it.chain(list_1, list_2, list_3, list_4))))
c += Counter(map(t, list_5))
c

Out[]:
Counter({('Bread', 'Oil'): 2,
         ('Bread', 'Salt'): 1,
         ('Protein', 'Soup'): 1,
         ('Salad', 'Chicken'): 1,
         ('Salad', 'Fat'): 3,
         ('Salad', 'Protein'): 1,
         ('Salt', 'Sugar'): 2,
         ('Sugar', 'Bread'): 1,
         ('Sugar', 'Protein'): 1,
         ('Sugar', 'Salt'): 1})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM