简体   繁体   中英

How to efficiently concatenate lists if an element is present in every list in python

I have 3 lists as follows.

mylist1 = [["present", [1,1,1]], ["trip", [1,1,1]], ["money", [1,8,6]], ["food", [6,6,6]], ["dog", [8,6,2]]]
mylist2 = [["cat", [8,8,8]], ["trip", [5,2,8]], ["present", [8,2,6]], ["parrot", [5]], ["dogs", [8]]]
mylist3 = [["dog", [8,5]], ["trip", [8]], ["present", [6]], ["tree", [6]], ["dogs", [8]]]

I want to identify words that are common to all three lists and merge their values to a list.

So, my output should be as follows.

[["present", [[1,1,1], [8,2,6], [6]]], ["trip", [[1,1,1], [5,2,8], [8]]]]

I am currently doing it as follows.

lists = [mylist1, mylist2, mylist3]
mywords = []
for mylist in lists:
   for item in mylist:
     mywords.append(item[0])

my_new_list = []
for word in mywords:
   myflag = 1
   myvalues = []
   for mylist in lists:
     mytemp = []
     for item in mylist:
       if word == item[0]:
         mytemp = item[1]
         myvalues.append(mytemp)

     if len(mytemp) == 0:
         myflag = 0

   if myflag != 0:
     my_new_list.append([word,myvalues])

However, this is really inefficient when I have about 10000 elements in each lists and take hours to run. I am wondering if there is more efficient way of doing this in python.

I am happy to provide more details if needed.

Use the common element as a key in a defaultdict with lists containing the values you want to merge.
Assuming that the common element does not appear more than once in a list, ie there are no duplicates, and given the fact that you want it to be present in every list, it means that the number of elements in the merged list must be equal with the number of lists; one element for each list.

from collections import defaultdict
d = defaultdict(list)
for L in lists:
    for k, v in L: 
        d[k].append(v)
output = [[k, v] for k, v in d.items() if len(v) == len(lists)]

If you want to validate the assumption of no duplicates, you could use a Counter :

from collections import Counter
from operator import itemgetter
for L in lists:
    c = Counter(map(itemgetter(0), L)).values()
    if any(v > 1 for v in c.values()):
        print('Invalid list:', L)

If you know the number of lists you have, you can do something like (which would be marginally better) using groupby (provided there is only one word category in one list):

from itertools import groupby

mylist1 = [["present", [1,1,1]], ["trip", [1,1,1]], ["money", [1,8,6]], ["food", [6,6,6]], ["dog", [8,6,2]]]
mylist2 = [["cat", [8,8,8]], ["trip", [5,2,8]], ["present", [8,2,6]], ["parrot", [5]], ["dogs", [8]]]
mylist3 = [["dog", [8,5]], ["trip", [8]], ["present", [6]], ["tree", [6]], ["dogs", [8]]]

res = []
f = lambda x: x[0]
for k, g in groupby(sorted(mylist1 + mylist2 + mylist3, key=f), key=f):
    lst = list(g)
    if len(lst) == 3:
        res.append([k, [x[1] for x in lst]])

print(res)

# [['present', [[1, 1, 1], [8, 2, 6], [6]]],
#  ['trip', [[1, 1, 1], [5, 2, 8], [8]]]]

Another way is to convert your lists to dictionary and use simple lookups which is more performant than above:

d1 = dict(mylist1)
d2 = dict(mylist2)
d3 = dict(mylist3)

print([[k, [v, d2[k], d3[k]]] for k, v in d1.items() if k in d2 and k in d3])

# [['present', [[1, 1, 1], [8, 2, 6], [6]]],
#  ['trip', [[1, 1, 1], [5, 2, 8], [8]]]]

Check this one

from collections import defaultdict

mylist1 = [["present", [1,1,1]], ["trip", [1,1,1]], ["money", [1,8,6]], ["food", [6,6,6]], ["dog", [8,6,2]]]
mylist2 = [["cat", [8,8,8]], ["trip", [5,2,8]], ["present", [8,2,6]], ["parrot", [5]], ["dogs", [8]]]
mylist3 = [["dog", [8,5]], ["trip", [8]], ["present", [6]], ["tree", [6]], ["dogs", [8]]]

dict1 = {d[0]: d[1:] for d in mylist1}
dict2 = {d[0]: d[1:] for d in mylist2}
dict3 = {d[0]: d[1:] for d in mylist3}
#Instead of creating the dictonaries in the above fashion you can create a loop to avoid the bad styling

dd = defaultdict(list)
for d in (dict1, dict2,dict3): # Add N dict here
    for key, value in d.items():
        dd[key].append(value)

print(dd)

输出

Edit 1: Sorry for not noticing the redundant brackets and thanks to @Cristian Ciupitu for noticing it.
To remove redundant square brackets kindly replace with this code.

dict1 = {d[0]: d[1:][0] for d in mylist1}

Hope the output is correct now.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM