简体   繁体   English

如果 python 中的每个列表中都存在一个元素,如何有效地连接列表

[英]How to efficiently concatenate lists if an element is present in every list in python

I have 3 lists as follows.我有 3 个列表如下。

mylist1 = [["present", [1,1,1]], ["trip", [1,1,1]], ["money", [1,8,6]], ["food", [6,6,6]], ["dog", [8,6,2]]]
mylist2 = [["cat", [8,8,8]], ["trip", [5,2,8]], ["present", [8,2,6]], ["parrot", [5]], ["dogs", [8]]]
mylist3 = [["dog", [8,5]], ["trip", [8]], ["present", [6]], ["tree", [6]], ["dogs", [8]]]

I want to identify words that are common to all three lists and merge their values to a list.我想识别所有三个列表共有的单词并将它们的值合并到一个列表中。

So, my output should be as follows.所以,我的 output 应该如下。

[["present", [[1,1,1], [8,2,6], [6]]], ["trip", [[1,1,1], [5,2,8], [8]]]]

I am currently doing it as follows.我目前正在这样做。

lists = [mylist1, mylist2, mylist3]
mywords = []
for mylist in lists:
   for item in mylist:
     mywords.append(item[0])

my_new_list = []
for word in mywords:
   myflag = 1
   myvalues = []
   for mylist in lists:
     mytemp = []
     for item in mylist:
       if word == item[0]:
         mytemp = item[1]
         myvalues.append(mytemp)

     if len(mytemp) == 0:
         myflag = 0

   if myflag != 0:
     my_new_list.append([word,myvalues])

However, this is really inefficient when I have about 10000 elements in each lists and take hours to run.但是,当我在每个列表中有大约 10000 个元素并且需要几个小时才能运行时,这确实是低效的。 I am wondering if there is more efficient way of doing this in python.我想知道在 python 中是否有更有效的方法来执行此操作。

I am happy to provide more details if needed.如果需要,我很乐意提供更多详细信息。

Use the common element as a key in a defaultdict with lists containing the values you want to merge.将 common 元素用作defaultdict中的键,其中包含要合并的值的列表。
Assuming that the common element does not appear more than once in a list, ie there are no duplicates, and given the fact that you want it to be present in every list, it means that the number of elements in the merged list must be equal with the number of lists;假设公共元素在列表中出现的次数不超过一次,即没有重复项,并且鉴于您希望它出现在每个列表中,这意味着合并列表中的元素数量必须相等与列表的数量; one element for each list.每个列表一个元素。

from collections import defaultdict
d = defaultdict(list)
for L in lists:
    for k, v in L: 
        d[k].append(v)
output = [[k, v] for k, v in d.items() if len(v) == len(lists)]

If you want to validate the assumption of no duplicates, you could use a Counter :如果要验证没有重复的假设,可以使用Counter

from collections import Counter
from operator import itemgetter
for L in lists:
    c = Counter(map(itemgetter(0), L)).values()
    if any(v > 1 for v in c.values()):
        print('Invalid list:', L)

If you know the number of lists you have, you can do something like (which would be marginally better) using groupby (provided there is only one word category in one list):如果您知道您拥有的列表数量,您可以使用groupby执行类似的操作(这会稍微好一点)(前提是一个列表中只有一个单词类别):

from itertools import groupby

mylist1 = [["present", [1,1,1]], ["trip", [1,1,1]], ["money", [1,8,6]], ["food", [6,6,6]], ["dog", [8,6,2]]]
mylist2 = [["cat", [8,8,8]], ["trip", [5,2,8]], ["present", [8,2,6]], ["parrot", [5]], ["dogs", [8]]]
mylist3 = [["dog", [8,5]], ["trip", [8]], ["present", [6]], ["tree", [6]], ["dogs", [8]]]

res = []
f = lambda x: x[0]
for k, g in groupby(sorted(mylist1 + mylist2 + mylist3, key=f), key=f):
    lst = list(g)
    if len(lst) == 3:
        res.append([k, [x[1] for x in lst]])

print(res)

# [['present', [[1, 1, 1], [8, 2, 6], [6]]],
#  ['trip', [[1, 1, 1], [5, 2, 8], [8]]]]

Another way is to convert your lists to dictionary and use simple lookups which is more performant than above:另一种方法是将列表转换为字典并使用比上面更高效的简单查找:

d1 = dict(mylist1)
d2 = dict(mylist2)
d3 = dict(mylist3)

print([[k, [v, d2[k], d3[k]]] for k, v in d1.items() if k in d2 and k in d3])

# [['present', [[1, 1, 1], [8, 2, 6], [6]]],
#  ['trip', [[1, 1, 1], [5, 2, 8], [8]]]]

Check this one检查这个

from collections import defaultdict

mylist1 = [["present", [1,1,1]], ["trip", [1,1,1]], ["money", [1,8,6]], ["food", [6,6,6]], ["dog", [8,6,2]]]
mylist2 = [["cat", [8,8,8]], ["trip", [5,2,8]], ["present", [8,2,6]], ["parrot", [5]], ["dogs", [8]]]
mylist3 = [["dog", [8,5]], ["trip", [8]], ["present", [6]], ["tree", [6]], ["dogs", [8]]]

dict1 = {d[0]: d[1:] for d in mylist1}
dict2 = {d[0]: d[1:] for d in mylist2}
dict3 = {d[0]: d[1:] for d in mylist3}
#Instead of creating the dictonaries in the above fashion you can create a loop to avoid the bad styling

dd = defaultdict(list)
for d in (dict1, dict2,dict3): # Add N dict here
    for key, value in d.items():
        dd[key].append(value)

print(dd)

输出

Edit 1: Sorry for not noticing the redundant brackets and thanks to @Cristian Ciupitu for noticing it.编辑1:很抱歉没有注意到多余的括号,并感谢@Cristian Ciupitu 注意到它。
To remove redundant square brackets kindly replace with this code.要删除多余的方括号,请用此代码替换。

dict1 = {d[0]: d[1:][0] for d in mylist1}

Hope the output is correct now.希望 output 现在是正确的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM