简体   繁体   中英

How to merge two lists into dictionary without using nested for loop

I have two lists:

a = [0, 0, 0, 1, 1, 1, 1, 1, .... 99999]
b = [24, 53, 88, 32, 45, 24, 88, 53, ...... 1]

I want to merge those two lists into a dictionary like:

{
    0: [24, 53, 88], 
    1: [32, 45, 24, 88, 53], 
    ...... 
    99999: [1]
}

A solution might be using for loop, which does not look good and elegant, like:

d = {}
unique_a = list(set(list_a))
for i in range(len(list_a)):
    if list_a[i] in d.keys:
        d[list_a[i]].append(list_b[i])
    else:
        d[list_a] = [list_b[i]]

Though this does work, it's an inefficient and would take too much time when the list is extremely large. I want to know more elegant ways to construct such a dictionary?

Thanks in advance!

You can use a defaultdict :

from collections import defaultdict
d = defaultdict(list)
list_a = [0, 0, 0, 1, 1, 1, 1, 1, 9999]
list_b = [24, 53, 88, 32, 45, 24, 88, 53, 1]
for a, b in zip(list_a, list_b):
   d[a].append(b)

print(dict(d))

Output:

{0: [24, 53, 88], 1: [32, 45, 24, 88, 53], 9999: [1]}

Alternative itertools.groupby() solution:

import itertools

a = [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3]
b = [24, 53, 88, 32, 45, 24, 88, 53, 11, 22, 33, 44, 55, 66, 77]

result = { k: [i[1] for i in g] 
           for k,g in itertools.groupby(sorted(zip(a, b)), key=lambda x:x[0]) }
print(result)

The output:

{0: [24, 53, 88], 1: [24, 32, 45, 53, 88], 2: [11, 22, 33, 44, 55, 66], 3: [77]}

No fancy structures, just a plain ol' dictionary.

d = {}
for x, y in zip(a, b):
    d.setdefault(x, []).append(y)

You can do this with a dict comprehension:

list_a = [0, 0, 0, 1, 1, 1, 1, 1]
list_b = [24, 53, 88, 32, 45, 24, 88, 53]
my_dict = {key: [] for key in set(a)}  # my_dict = {0: [], 1: []}
for a, b in zip(list_a, list_b):
    my_dict[a].append(b)
# {0: [24, 53, 88], 1: [32, 45, 24, 88, 53]}

Oddly enough, you cannot seem to make this work using dict.fromkeys(set(list_a), []) as this will set the value of all keys equal to the same empty array:

my_dict = dict.fromkeys(set(list_a), [])  # my_dict = {0: [], 1: []}
my_dict[0].append(1)  # my_dict = {0: [1], 1: [1]}

A pandas solution:

Setup:

import pandas as pd

a = [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 4, 4]

b = pd.np.random.randint(0, 100, len(a)).tolist()

>>> b
Out[]: [28, 68, 71, 25, 25, 79, 30, 50, 17, 1, 35, 23, 52, 87, 21]


df = pd.DataFrame(columns=['Group', 'Value'], data=list(zip(a, b)))  # Create a dataframe

>>> df
Out[]:
    Group  Value
0       0     28
1       0     68
2       0     71
3       1     25
4       1     25
5       1     79
6       1     30
7       1     50
8       2     17
9       2      1
10      2     35
11      3     23
12      4     52
13      4     87
14      4     21

Solution:

>>> df.groupby('Group').Value.apply(list).to_dict()
Out[]:
{0: [28, 68, 71],
 1: [25, 25, 79, 30, 50],
 2: [17, 1, 35],
 3: [23],
 4: [52, 87, 21]}

Walkthrough:

  1. create a pd.DataFrame from the input lists, a is called Group and b called Value
  2. df.groupby('Group') creates groups based on a
  3. .Value.apply(list) gets the values for each group and cast it to list
  4. .to_dict() converts the resulting DataFrame to dict

Timing:

To get an idea of timings for a test set of 1,000,000 values in 100,000 groups:

a = sorted(np.random.randint(0, 100000, 1000000).tolist())
b = pd.np.random.randint(0, 100, len(a)).tolist()
df = pd.DataFrame(columns=['Group', 'Value'], data=list(zip(a, b)))

>>> df.shape
Out[]: (1000000, 2)

%timeit df.groupby('Group').Value.apply(list).to_dict()
4.13 s ± 9.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

But to be honest it is likely less efficient than itertools.groupby suggested by @RomanPerekhrest, or defaultdict suggested by @Ajax1234.

Maybe I miss the point, but at least I will try to help. If you have to lists and want to put them in the dict do the following

a = [1, 2, 3, 4]
b = [5, 6, 7, 8]
lists = [a, b] # or directly -> lists = [ [1, 2, 3, 4], [5, 6, 7, 8] ]
new_dict = {}
for idx, sublist in enumerate([a, b]): # or enumerate(lists)
    new_dict[idx] = sublist

hope it helps

Or do dictionary comprehension beforehand, then since all keys are there with values of empty lists, iterate trough the zip of the two lists, then add the second list's value to the dictionary's key naming first list's value, no need for try-except clause (or if statements), to see if the key exists or not, because of the beforehand dictionary comprehension:

d={k:[] for k in l}
for x,y in zip(l,l2):
   d[x].append(y)

Now:

print(d)

Is:

{0: [24, 53, 88], 1: [32, 45, 24, 88, 53], 9999: [1]}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM