简体   繁体   中英

Cartesian product of multiple lists of dictionaries

I have two or more dictionaries and each of them is a list of dictionaries (something like json format), for example:

list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]
cartesian_product(list_1 * list_2) = [{'Name': 'John', 'Age':25, 'Product': 'Car', 'Id': 1}, {'Name': 'John', 'Age':25, 'Product': 'TV', 'Id': 2}, {'Name': 'Mary' , 'Age': 15, 'Product': 'Car', 'Id': 1}, {'Name': 'Mary' , 'Age': 15, 'Product': 'TV', 'Id': 2}]

How can I do this and be efficient with memory use? The way i'm doing it right now runs out of RAM with big lists. I know it's probably something with itertools.product , but i couldn't figure out how to do this with a list of dicts. Thank you.

PD: I'm doing it this way for the moment:

gen1 = (row for row in self.tables[0])
table = []
for row in gen1:
    gen2 = (dictionary for table in self.tables[1:] for dictionary in table)
    for element in gen2:
         new_row = {}
         new_row.update(row)
         new_row.update(element)
         table.append(new_row)

Thank you!

Here is a solution to the problem posted:

list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]


from itertools import product
ret_list = []
for i1, i2 in product(list_1, list_2):
    merged = {}
    merged.update(i1)
    merged.update(i2)
    ret_list.append(merged)

The key here is to make use of the update functionality of dicts to add members. This version will leave the parent dicts unmodified. and will silently drop duplicate keys in favor of whatever is seen last.

However, this will not help with memory usage. The simple fact is that if you want to do this operation in memory you will need to be able to store the starting lists and the resulting product. Alternatives include periodically writing to disk or breaking the starting data into chunks and deleting chunks as you go.

Just convert the dictionaries to lists, take the product, and back to dictionaries again:

import itertools

list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]
l1 = [l.items() for l in list_1]
l2 = [l.items() for l in list_2]
print [dict(l[0] + l[1]) for l in itertools.product(l1, l2)]

The output is:

[{'Age': 25, 'Id': 1, 'Name': 'John', 'Product': 'Car'}, {'Age': 25, 'Id': 2, 'Name': 'John', 'Product': 'TV'}, {'Age': 15, 'Id': 1, 'Name': 'Mary', 'Product': 'Car'}, {'Age': 15, 'Id': 2, 'Name': 'Mary', 'Product': 'TV'}]

If this isn't memory-efficient enough for you, then try:

for l in itertools.product(l1.iteritems() for l1 in list_1,
                           l2.iteritems() for l2 in list_2):
    # work with one product at a time

For Python 3:

import itertools

list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]
print ([{**l[0], **l[1]} for l in itertools.product(list_1, list_2)])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM