简体   繁体   English

合并/追加共享共同项目的列表

[英]Merge/append lists that share a common item

The title may be misleading, so feel free to change the wording when the terminology for the real problem arises. 标题可能会误导您,因此在出现实际问题的术语时,请随时更改措词。 =) =)

In this case, I am aware that the lists can probably be interchanged with tuples, for the most part. 在这种情况下,我知道列表在大多数情况下可能可以与元组互换。 The end result can be any iterable as far as I'm concerned. 就我而言,最终结果可以是任意迭代的。

I have two lists-of-lists. 我有两个列表。 Suppose they are: 假设它们是:

list_a = [[1, 'f00d'], [2, 'dead'], [3, 'beef']]
list_b = [[1, 'frankenbeans'], [2, 'chickensoup'], [3, 'spaceballs']]

Neither list is necessarily the same length, nor is guaranteed that they contain a common first element. 列表的长度不一定相同,也不能保证它们包含共同的第一个元素。

What I'm trying to do is create a new list-of-lists/list-of-tuples/list-of-dicts/whatever, as such: 我正在尝试做的是创建一个新的列表列表/元组列表/字典列表/其他内容,例如:

list_c = [[1, 'f00d', 'frankenbeans'], [2, 'dead', 'chickensoup'], [3, 'beef', 'spaceballs']

Updated : Basically, I know the position of the common "ID" in these lists, though it is not necessarily sequential, nor are the lists-of-lists in the same order (but is an integer). 更新 :基本上,我知道公用“ ID”在这些列表中的位置,尽管它不一定是顺序的,列表的顺序也不是相同的 (而是整数)。 I'm looking for an efficient way to create a new set of the sub-lists, based on that common ID. 我正在寻找一种基于该通用ID创建一组新的子列表的有效方法。

The naive way: 天真的方法:

new_list = []
for list_a_list in list_a:
  for list_b_list in list_b:
    if list_a_list[0] = list_b_list[0]:
      new_list.append([list_a_list[0], list_a_list[1], list_b_list[1]])

... or some such. ...或类似的东西。 Giving me the feeling that there's a much "smarter" way to do this, but I kinda suck at that. 给我一种感觉,有很多“更智能”的方法可以做到这一点,但我还是很烂。

Update: 更新:
Does it add any bearing if I mention that the list-of-lists each carry thousands to a million items at a time? 如果我提到列表一次包含数千到一百万个项目,这是否增加了任何影响?

from collections import defaultdict
from itertools import chain

final = defaultdict(list)

for idx, value in chain(l1, l2):
  final[idx].append(value)

# and if you have to have a list of lists at the end
finalList = [[k] + v for k, v in final.iteritems()]

Your input lists should be dictionaries in the first place: 输入列表首先应该是字典:

dict_a = dict(list_a)
dict_b = dict(list_b)
dict_c = dict((k, [v, dict_b[k]]) for k,v in dict_a.items())

If keys are not guaranteed to occur in both lists, you'll have to be a little more careful: 如果不能保证在两个列表中都出现密钥,则必须多加注意:

all_keys = set(dict_a.keys()) | set(dict_b.keys())
dict_c = dict((k, (dict_a.get(k), dict_b.get(k))) for k in all_keys)

For example, for list_a = [(1, 'a')] and list_b = [(1, 'b'), (2, 'c')] , the above would set dict_c to {1: ('a', 'b'), 2: (None, 'c')} . 例如,对于list_a = [(1, 'a')]list_b = [(1, 'b'), (2, 'c')] ,上述方法会将dict_c设置为{1: ('a', 'b'), 2: (None, 'c')}

itertools.groupby() is helpful for this kind of task: itertools.groupby()对于此类任务很有帮助:

from itertools import groupby, chain
from operator import itemgetter

list_a = [[1, 'f00d'], [2, 'dead'], [3, 'beef']]
list_b = [[1, 'frankenbeans'], [2, 'chickensoup'], [3, 'spaceballs']]

combined = [(k, [v[1] for v in g]) for k, g in
            groupby(sorted(list_a+list_b), key=itemgetter(0))]

print combined

Note that it was necessary to create a new sorted list combining list_a and list_b before we can use groupby, since groupby assumes that the list will already be sorted by the key. 注意,在我们可以使用groupby之前,有必要创建一个将list_a和list_b组合在一起的新排序列表,因为groupby假定该列表已经通过键进行了排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM