简体   繁体   English

如何在不使用嵌套for循环的情况下将两个列表合并到字典中

[英]How to merge two lists into dictionary without using nested for loop

I have two lists:我有两个清单:

a = [0, 0, 0, 1, 1, 1, 1, 1, .... 99999]
b = [24, 53, 88, 32, 45, 24, 88, 53, ...... 1]

I want to merge those two lists into a dictionary like:我想将这两个列表合并到一个字典中,例如:

{
    0: [24, 53, 88], 
    1: [32, 45, 24, 88, 53], 
    ...... 
    99999: [1]
}

A solution might be using for loop, which does not look good and elegant, like:一个解决方案可能是使用for循环,它看起来不太好和优雅,例如:

d = {}
unique_a = list(set(list_a))
for i in range(len(list_a)):
    if list_a[i] in d.keys:
        d[list_a[i]].append(list_b[i])
    else:
        d[list_a] = [list_b[i]]

Though this does work, it's an inefficient and would take too much time when the list is extremely large.虽然这确实有效,但它效率低下,并且当列表非常大时会花费太多时间。 I want to know more elegant ways to construct such a dictionary?我想知道更优雅的方法来构造这样的字典吗?

Thanks in advance!提前致谢!

You can use a defaultdict :您可以使用defaultdict

from collections import defaultdict
d = defaultdict(list)
list_a = [0, 0, 0, 1, 1, 1, 1, 1, 9999]
list_b = [24, 53, 88, 32, 45, 24, 88, 53, 1]
for a, b in zip(list_a, list_b):
   d[a].append(b)

print(dict(d))

Output:输出:

{0: [24, 53, 88], 1: [32, 45, 24, 88, 53], 9999: [1]}

Alternative itertools.groupby() solution:替代itertools.groupby()解决方案:

import itertools

a = [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3]
b = [24, 53, 88, 32, 45, 24, 88, 53, 11, 22, 33, 44, 55, 66, 77]

result = { k: [i[1] for i in g] 
           for k,g in itertools.groupby(sorted(zip(a, b)), key=lambda x:x[0]) }
print(result)

The output:输出:

{0: [24, 53, 88], 1: [24, 32, 45, 53, 88], 2: [11, 22, 33, 44, 55, 66], 3: [77]}

No fancy structures, just a plain ol' dictionary.没有花哨的结构,只是一本普通的字典。

d = {}
for x, y in zip(a, b):
    d.setdefault(x, []).append(y)

You can do this with a dict comprehension:你可以用字典理解来做到这一点:

list_a = [0, 0, 0, 1, 1, 1, 1, 1]
list_b = [24, 53, 88, 32, 45, 24, 88, 53]
my_dict = {key: [] for key in set(a)}  # my_dict = {0: [], 1: []}
for a, b in zip(list_a, list_b):
    my_dict[a].append(b)
# {0: [24, 53, 88], 1: [32, 45, 24, 88, 53]}

Oddly enough, you cannot seem to make this work using dict.fromkeys(set(list_a), []) as this will set the value of all keys equal to the same empty array:奇怪的是,您似乎无法使用dict.fromkeys(set(list_a), [])来完成这项工作dict.fromkeys(set(list_a), [])因为这会将所有键的值设置为相同的空数组:

my_dict = dict.fromkeys(set(list_a), [])  # my_dict = {0: [], 1: []}
my_dict[0].append(1)  # my_dict = {0: [1], 1: [1]}

A pandas solution: pandas解决方案:

Setup:设置:

import pandas as pd

a = [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 4, 4]

b = pd.np.random.randint(0, 100, len(a)).tolist()

>>> b
Out[]: [28, 68, 71, 25, 25, 79, 30, 50, 17, 1, 35, 23, 52, 87, 21]


df = pd.DataFrame(columns=['Group', 'Value'], data=list(zip(a, b)))  # Create a dataframe

>>> df
Out[]:
    Group  Value
0       0     28
1       0     68
2       0     71
3       1     25
4       1     25
5       1     79
6       1     30
7       1     50
8       2     17
9       2      1
10      2     35
11      3     23
12      4     52
13      4     87
14      4     21

Solution:解决方案:

>>> df.groupby('Group').Value.apply(list).to_dict()
Out[]:
{0: [28, 68, 71],
 1: [25, 25, 79, 30, 50],
 2: [17, 1, 35],
 3: [23],
 4: [52, 87, 21]}

Walkthrough:演练:

  1. create a pd.DataFrame from the input lists, a is called Group and b called Value从输入列表创建一个pd.DataFramea称为Groupb称为Value
  2. df.groupby('Group') creates groups based on a df.groupby('Group')基于创建组a
  3. .Value.apply(list) gets the values for each group and cast it to list .Value.apply(list)获取每个组的值并将其转换为list
  4. .to_dict() converts the resulting DataFrame to dict .to_dict()将生成的DataFrame转换为dict

Timing:定时:

To get an idea of timings for a test set of 1,000,000 values in 100,000 groups:要了解 100,000 个组中包含 1,000,000 个值的测试集的计时:

a = sorted(np.random.randint(0, 100000, 1000000).tolist())
b = pd.np.random.randint(0, 100, len(a)).tolist()
df = pd.DataFrame(columns=['Group', 'Value'], data=list(zip(a, b)))

>>> df.shape
Out[]: (1000000, 2)

%timeit df.groupby('Group').Value.apply(list).to_dict()
4.13 s ± 9.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

But to be honest it is likely less efficient than itertools.groupby suggested by @RomanPerekhrest, or defaultdict suggested by @Ajax1234.但老实说,它可能不如 @RomanPerekhrest 建议的itertools.groupby或 @Ajax1234 建议的defaultdict效率低。

Maybe I miss the point, but at least I will try to help.也许我没有抓住重点,但至少我会尽力提供帮助。 If you have to lists and want to put them in the dict do the following如果您必须列出并希望将它们放入字典中,请执行以下操作

a = [1, 2, 3, 4]
b = [5, 6, 7, 8]
lists = [a, b] # or directly -> lists = [ [1, 2, 3, 4], [5, 6, 7, 8] ]
new_dict = {}
for idx, sublist in enumerate([a, b]): # or enumerate(lists)
    new_dict[idx] = sublist

hope it helps希望能帮助到你

Or do dictionary comprehension beforehand, then since all keys are there with values of empty lists, iterate trough the zip of the two lists, then add the second list's value to the dictionary's key naming first list's value, no need for try-except clause (or if statements), to see if the key exists or not, because of the beforehand dictionary comprehension:或者事先做字典理解,然后由于所有键都具有空列表的值,遍历两个列表的zip ,然后将第二个列表的值添加到命名第一个列表值的字典键中,不需要try-except子句(或 if 语句),以查看键是否存在,因为之前的字典理解:

d={k:[] for k in l}
for x,y in zip(l,l2):
   d[x].append(y)

Now:现在:

print(d)

Is:是:

{0: [24, 53, 88], 1: [32, 45, 24, 88, 53], 9999: [1]}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM